Archive for Data

Open Data Day Hong Kong: Event & Follow-up

Twenty people gathered at a good work room provided by Odd-e on Connaught Road West. Very nice of Odd-e to donate the space.  Four of us pitched 4 issues and we quickly formed groups.  Take a look at the hackpad created before, during and after the Open Data Day.

  1. List of vacant primary and secondary schools in Hong Kong- how many and how could the public more easily know what is going on with a vacant schools (3 people)
  2.  Cops / Bad Cops – reporting to the police / reporting by the police (3-4 people)
  3. Impact of Environmental Policy on actual air quality improvements (6-7 people)
  4. A crowdsourcing database of missing people in Hong Kong (4-5 people)
  5. Submitted after the day about pay-walls and how they enable misrepresentation and possible fraud.

Sammy Fung, the day’s organizer, was a floater during the day.  We are all very grateful for Sammy’s efforts to make the day possible.

We were well supplied with coffee, drinks, snacks and we worked throughout the day.  There was some exchanges between the groups on ‘how to do x or y’.  Generally though the conversations were within the groups until the end of the day presentations.  Good progress was made by 6pm.  Below are my observations and short summaries of what was accomplished during the day.  These are drafts.  I’ll very likely change them once I have feedback.  There are photos and other details on the Open Data Day Hong Kong 2016 Facebook page.

The theme connecting the 4 issues is ‘lack of trust’.  There was no plan or prior discussion on the issues to be used for the Open Data Day.  It’s an indication of how one group of HK citizens feel.  Who are these people?  Young, middle-age and frankly old.  Male and Female.   All have some sort of technical, statistical, analytical experience or inclination.  Twenty people’s views cannot be extrapolated out to 7 million.   How much trust do these 20 HK citizens have in the HK Government?   Not much is the answer.  Based on the tentative data gathered and analysed around the 4 issues, the HK public should listen carefully and ask questions when the HK Government claims it’s doing its best to be open, transparent and truthful.

(1)  Take a look at Public Accounts Committee of the 5th Legislative Council P.A.C. Report no. 65, part 8., chapter 3, page 135, clause 25.   The EDB admits it misrepresented to a Legco member’s question the total number of vacant schools.  The EDB responded there were 108 vacant schools when there were 234 vacant schools at that time, 2015-2016.   How many vacant schools are there and can we verify school vacancy without relying solely on data supplied by HK government departments?

Ans:  The Education Bureau’s has a website of ‘all schools’ with 11,077  and a dataset of ‘all schools’ with 3,507.  Why such a huge difference?   The answer is when is a school a school.  All 11,077 are schools.  All 3,507 are schools.  The EDB’s definition of ‘school’ is both vague and precise.  The discrepancy is the bigger list includes all of the tutor, cram or learning centre schools and other special schools, whereas the shorter list is only kindergartens, primary, secondary.  The numbers never quite match up.  The shorter list provides longitude and latitude of ‘schools’ and these can be mapped into Google Maps.  Text analysis of the descriptions from Google Maps may reveal if this is currently a ‘school’.   Automating the process is the key challenge.  After a few cycles the number of vacant schools should become clearer.  We don’t know for sure but it’s likely many more than the 29 the Education Bureau reported on 17 February 2016.  More will follow …

(2) The HK Police release glowing praise letters they receive.  The HK Police seem less inclined to release complaining letters they receive.   However, some complaining letters are released.  By applying text analysis on the praising letters and on the complaining letters released to the HK public what does it reveal?

Ans. The team managed to  completely automate the collection and the analysis process.  This is impressive for a day’s work.  The interpretation is coming.  More will follow …

(3) The air quality has improved over the past 10 years we are told by various government groups and NGOs.  By collecting and analysing the daily air quality released to the public does it reveal statistically significant improvement in the air quality over a period of time?

Ans. The data was collected and analysed.  There is measurable change.  Is it statistically significant change or simply random?  More will follow …

(4) Hong Kong is a city with millions of people living and passing through every year.  Some people go missing.  Could using crowd-sourcing along with the HK Police missing persons site help family and friends find the missing people?

Ans. Mixing and matching data scraped from public websites was done.  The HK Police were claiming ‘personal privacy’ concerns when a member of this group enquired a few days before Open Data Day.  What will be the reaction to scraped data combined with details supplied by family and friends?  More will follow …

(5) David Webb, an investor activist, in Hong Kong provided, ‘Deception behind the Companies Registry paywall‘ for the Open Data Day.  David writes on his useful, free, open and transparent Webb-site Reports,

“On International Open Data Day, we reveal a network of knock-off companies using the CIBC, Credit Suisse and BNP brands, based in HK with subsidiaries in the UK and New Zealand. If those registries were not free and open, the deception would remain undiscovered. We call on HK Registrar Ada Chung to tear down this paywall.”

Hong Kong with massive government reserves collects small amounts of money from its citizens.  Why go to such trouble for unneeded revenue?

The Hong Kong Government’s Office of the Chief Information Officer hosts the Data 1 site, DATA.GOV.HK.  The webpages do look much better than they did a few years ago.  The datasets seem about the same but a careful comparison may reveal improvements.  I note there are 15 applications ‘showcased’ as examples of:

“creative web and mobile applications and solutions developed by the Government and community with DATA.GOV.HK datasets. These examples will demonstrate the potentials of the public sector information provided in digital formats.” (from Applications)

All are interesting applications of open source public data.  However, fifteen applications seems rather paltry.  Why can’t the OGCIO provide a more comprehensive list of the applications which uses these datasets?  Let’s hope for improvement by Open Data Day Hong Kong 2017.

Leave a Comment

Open Data Day Hong Kong 2016

March 5, Saturday, is Open Data Day around the world. There will be a hackathon from 10am until 7pm at Odd-e Hong Kong, 35-36 Connaught Road West, Hong Kong. Take a look at the Facebook page for the details. A maximum of 30 people can participate. As of the first of March, 8 people have registered. Open Data Day Hong Kong 2016

Register for the Open Data Day Hong Kong 2016 at Eventbrite

The Hong Kong government has a website with datasets available to the public.

“You can download, distribute, reproduce, re-use or hyperlink to DATA.GOV.HK datasets for commercial or non-commercial purposes free-of-charge.”

Data.gov.hk – maintained by the HK Government OGCIO

Leave a Comment

Edward Snowden | Letter from Hong Kong

Dear Edward,

A few days before you revealed the NSA’s hacking of telecommunications and Internet traffic around the world I was giving a presentation here in Hong Kong at the Open Data Hong Kong’s 3rd meetup on, ‘What is Open Data’. I started with a quote from Rufus Pollack, co-founder of the Open Knowledge Foundation saying in 2012, “Today we find ourselves in the midst of an open data revolution”. That revolution has ended up on Hong Kong’s doorstep with you fleeing here and stories now across the pages of most of the territory’s newspapers for the past several days.

The U.S. government is making decisions behind closed doors to manufacture for itself what is being done is legal but at the same time choosing to hide this manufactured legal truth from its own citizens. Personally, I am not that worried about my privacy. By choosing to use free internet based services such as Google Gmail, Google+, Facebook, WordPress and many others I grant the right for these services to know what I’m doing online in return for the free services they provide. I’ve long suspected that Internet service companies and various governments are monitoring most anything I’m doing online. However, if the US government monitors all electronic and telecommunication networks I do want to know that it is being done and with sufficient detail to understand the breadth and scope of the monitoring. I do not want to be lied to on an almost daily basis starting with President Obama and going down his chain of command.

The people who approve and are nominally in charge of these monitoring programmes likely do not have the technical expertise to understand how these programmes work. These programmes are not automatic, not done by anonymous machines and most simply are not magic. I suspect that in any 24-hour period these monitoring programmes only work at best around 75% to 85% effectiveness and it may be much less some of the time. You have been one of the thousands of people writing code, monitoring routines, and making hourly, daily, weekly and longer adjustments to a wildly complex group of systems that at any moment may stop working. You and others who do make these monitoring programmes work do not share the same philosophy as the people in charge and no amount of signing confidentiality agreements is going to make your change you philosophy. I want to thank-you for standing up and doing what is right at great personal risk. We are all better off that people like yourself, Daniel Ellsberg, Julian Assange, Bradley Manning and many others choose to stand-up and tell the truth.

I hope you got to see some of the Dragon Boat Festival, 龍船節, racing yesterday. Qu Yuan’s, 屈原, story is both sad and uplifting. The people so loved him they wanted to keep him safe. Hong Kong is a wonderful city and I know if will fight for openness and transparency and I hope it will keep you safe.

Many thanks,

Bill

Comments (2)

Data.One Analysis Summary & Report

As part of the Open Data HK Make.01 Hackathon I worked with a team on reviewing the HK Government’s Data.One site. We produced this report. The team’s effort was quite remarkable and I’ve described the hackathon in an earlier post on this blog. The report is made up of files posted on Google Drive and some public links. This very simple infograph may help you navigate around the main parts of the report. I recommend starting in the middle with Data.One Analysis Summary & Report.

Data.One Analysis Project – Report & Summary Graph
Data.One Analysis Summary & Report
UX/UI & Instruction/Education Parts 1 & 4
World Bank Assessment Tool Part 2 Content Relevance
PSI Datasets Parts 2 & 3 Content Relevance & Format of Files
Global PSI Data Catalogues
Data.One website

This report has also been made available to the HK Government’s OGCIO PSI (Office of the Chief Information Officer – Public Sector Information) team which maintains the Data.One website and Charles Mok, IT Sector, Legislative Councillor. This report is a jumping off point for further discussion on how to improve open data awareness and use in Hong Kong.

Summary
A report on the HK Government Data.One site was prepared as part of the hackathon organized by Open Data HK on 14, 18-19 May, 2013. The report targets user experience and interface, content relevance and usefulness for citizens, usability and format of the datasets made available and the instructions and education available for the general public and potential developers. The report gives examples from 19 public sector information open data websites around the word. The report makes recommendations on how to improve the Data.One site specifically as well as awareness and knowledge of open data in Hong Kong. An assessment tool from The World Bank was used to judge the completeness of the datasets being made available in Hong Kong. The formats of the datasets were evaluated and suggestions made on how to make them more useful for potential use by developers.

Leave a Comment

Open Data Hong Kong ~ Make.01 ~ Hackathon

I was a bit aprehensive about getting involved with anything called hackathon. For me and for a lot of others I suspect the word has connotations of electronically sneaking into an organizations computer system and stealing data. However, ‘hack‘ means something just done well as a verb and done playfully as a noun, at least according to that great resource of modern English usage the Urban Dictionary. So I went along on this past Saturday wondering if any of the people from the Catalyst night would be there and what were we going to get up to for the day. Would we do something well and playful?

I ran into someone at the Cheung Sha Wan MTR station and after loading up on tuna buns, soy milk and coffee heading out for The Good Lab. Arriving around 1pm and there were a few people in a large, bright and varied work space with kitchen, work-tables, work-benches, chairs of various shapes and sizes. It quickly filled up with about 40 people. I was involved in two projects. I found my fellow team-member, we got into the wifi and had a few conversations with people wandering around looking for possible projects. We then set to work. I was working on reviewing open data public sector information websites around the world, Data.One Analysis Project. My team-member was working on a form for crowd-sourcing potential datasets around the HK government websites, Opening Data. Most of the time people were heads-down working with some small group meetings. It was possible to eves-drop on some conversations. This was a good way of knowing what skills people had and maybe asking them a question. Around 6:30pm the group reported on progress and asked for help if required. Pizza was delivered and we ate and chatted. We kept working until 11pm.
Bill and Haggen So May 2013 Hackathon photo credit: Yolanda Jinxin Ma
Up around 8am on Sunday and made my way back to Cheung Sha Wan by 10am. Most of the same people were there plus some more. Yesterday’s team-member was joined by 2 others. We figured out what we needed to do and worked until lunchtime. There was a feeling of anxiety in the crowd. Downstairs for a good Chinese lunch and we talked about what was wrong with Hong Kong with a recent arriver from Spain. Back to work until a bit after 6pm and we started giving presentations on the results. There were some truly amazing results and knowledge sharing on how it was done. People were very interested in anything dealing with maps and how to use the not so friendly mapping CSV files available from the Data.One site. The list of projects is here. Hopefully, they will be updated in the coming weeks. Here are three that I believe deserve a special mention (but they were all really good):

Legco Meeting Log Parser ~ extract the Hong Kong Legislative Council meeting transcript and voting record from PDF and make it available. It begs the question why this isn’t made available in document format with audio and video transcripts.

Reporting Tool for Request for Access to Information ~ a centralized form with sharable tracking of requests for information to the appropriate HK government bureau or department. Hopefully this will motivate our government to have true Freedom of Information legislation in the coming years.

Hong Kong food security and mainland’s two standard on food quality ~ a way of putting on a map where food is coming from out of China into Hong Kong. Food security and safety is a huge issue in China and Hong Kong. The HK government should be sharing as much information as possible with where our food comes from and what are the past problems.

A member of the OGCIO PSI team and Charles Mok, IT Sector Legislative Councillor came over around 6pm. We had some Raspberry Pi prizes donated by Pindar Wong. The prize was chosen by popular acclamation choosing Legco Meeting Log Parser and Charles Mok gave out the prize. An interesting RTHK video of an interview with Pindar and Charles is here.

Did we do something well and was it playful? The work-products from our projects were excellent. The energy level was high. People were working really well and collaboratively and the atmosphere was a lot more playful then I’ve experienced in the dreaded corporate cubicle world a la Dilbert. So now I know what a hackathon is about.

Comments (1)

Open Data Hong Kong ~ Catalyst Night ~

Open Data Hong Kong is a group that was formed out of some talks at the Hong Kong Barcamp held at HK Polytechnic University this past February. A community of over a 100 people has formed quickly based first on a Google+ group and a couple of meetups with presentations and chatter on the 2nd floor of Delaney’s in Wan Chai and other gatherings around town. A Facebook page and the OpenDataHK website were setup recently. Establishing a dialogue between the users of open data in Hong Kong and the HK government is one of the goals for the group.

It’s impressive that there is so much interest in Open Data. What is it? The best resource I’ve found is the Open Data Handbook. You can listen to me go on about it here on a local public radio show recently here. The Open Data Hong Kong website has useful information on events and links to other sources on open data in and around Hong Kong that will keep growing. The Hong Kong government has had an open data initiative since 2011 called Data.One. The Hong Kong University Journalism & Media Study Centre ~ Data Journalism Lab ~ is a hotbed of activity on the data journalism side of open data in Hong Kong.

The Catalyst Night on 14 May is the kickoff for HKOpenDataMake.01, a hackathon event that will bring together developers, programmers, designers, thinkers and just the plain hangers-on to do and think about open data in Hong Kong. More than 50 people have signed up. The HK Government Office of the Chief Information Officer, Public Sector Information team will be attending the catalyst and talking about their plans for the Date.One initiative. At the Catalyst night the goal is to figure what to do over the next weekend. Teams will form and project goals will be set. Potential projects can be seen here. The teams will work, think and play around with datasets, tools and ideas and come up with results by Sunday afternoon. Presentations will be made on Sunday afternoon and Raspberry Pi prizes for the best results will be given out by Charles Mok, IT sector Legislative Council member.

Comments (1)

Digital Storytelling + Knowledge Conference + JAL kimono-class

Telling a story is at the centre of my life. I start more than half of my conversations with ‘let me tell you this good story’. I’m always thinking,
“How does this story relate to my story?” I feel like I’m filling up my personal story bank. In the past few days I attended a workshop
‘Digital Storytelling on the Web’, a conference, ‘Beyond KM: Delivery Value’ and a lecture, ‘Flying with Madame Butterfly: Early Japan Airlines Advertising in the US and Hong Kong’.

The workshop was a pre-conference event. Alan Levine is a web educationalist; which sounds clumsy but it gives a sense of his work and interests. Alan took a small group through many websites and a few tools that could help us facilitate a digital story. All of them can be found on 50+ Web 2.0 Ways to Tell a Story. A story is the most effective communication device humans have developed. ‘When you go outside this circle of fire something is likely to eat you’, remains one of the best reasons to remain close to the tribe. A digital story is not fundamentally different from a traditional spoken or visual story. Getting the audience’s attention and holding it is still the key challenge for the storyteller. The classic Freytag story arc still applies. However, a digital story may be more mixed up, more spontaneous or more complex with connecting images, sound, text and the possibility of the audience to dynamically manipulate the story.

These tools from the workshop will be useful:
• The closed wi-fi internet-like environment using ‘The StoryBox’ could be used to share text, images, audio and video without having to have it all up on the web.
Pechaflickr and Five Card Flickr could help to get people talking, and exploring how a story unfolds. They are good for storytelling practice in a second language.

The Hong Kong Knowledge Management Society’s conference, ‘Beyond KM: Delivering Value’, brought together about 40 people to listen and share some stories on the slippery topic of ‘knowledge value’. Knowledge without value seems like a contradiction. One of the more valuable lessons coming out of knowledge management and into organizational practice is the importance of storytelling as a communication tool. Many knowledge managers like to emphasize that conversation is the key enabler for knowledge transfer. With that thought, we had lunch first and sat at around tables talking. This was much more useful than sitting through a morning of presentations full of coffee and sugary buns wondering what was for lunch.

Four presenters, all highly experienced in the practical application of knowledge management talked about how or what to do to reveal the value in knowledge:
• The emic/etic distinction of what people think vs. what they say must always be forefront when collecting information from the customer.
• Innovation comes out of conflict so finding that point of conflict leads to innovation.
• Perfection is not that important. Good-enough works most of the time.
• Combining machines with humans is likely going to be more effective than only one or the other approach.
• Tagging started with Assyrian clay tablets 3,000 years ago and not much has changed.
• Access, security, governance, mitigation and standardization make it highly problematic to replicate Facebook-like social media inside an organization.
• Worry about knowledge creation before worrying about knowledge management. This will solve a whole host of potential issues.
• Nothing will ever replace experience.
• Managing for the few big, important or calamitous events will always be prohibitively difficult and will likely fail.
• People are pattern recognizers not information processors.
• People blend the patterns they recognize to make a conceptual whole that has immediately useful meaning.
• 5 is the number of words we will remember, 15 is the number of people we trust and 150 is the number of people we can recognize.
• Big data must have people at the centre to make it useful.

I dashed back across the Hong Kong harbour to the Museum of History for the Anthropological Society monthly gathering. Yoshiko Nakano’s told the story on how two ex-GI ad-men out of San Francisco developed the geisha service for JAL (Japan Airlines) in the 1950’s that continued up to 1970. The story of ‘Flying with Madame Butterfly: Early Japan Airlines Advertising in the US and Hong Kong’, was more than just American GI’s fascination with exotic Japanese woman but also a real need to accept the Japanese as useful allies in the looming cold war. It was easier to accept a beautiful, gracious, charming and compliant geisha-clad woman over that man in the army cap and buckteeth America had been fighting only a few years previously. With advertising budget many time less than Pan Am or Northwest, these American ad-men hit on a sure winner; geisha’s in the air serving exhausted western businessmen. It worked perfectly and the concept of aircrew in national costumes has become a mainstay of the airline industry to this day. That Japanese woman working as flight attendants didn’t enjoy the experience of wearing kimono and weren’t much use in an emergency situation was ignored. The stereotypic compliant Asian woman is still with us today and owes quite a lot to these images promoted by JAL’s kimono service.

Stories help us understand the world we experience and give us a view into a world that is not our own. The digital world requires we actively manage our digital personality. Knowing how to tell a digital story will help keep control of our digital personality. The digital line between inside and outside the organization remains a dilemma for anyone using social media. Linking our digital personality to its context may help delineate the line for how to use social media in our digital lives. Images are a useful marketing tool but some images promote stereotypes that are difficult to stop once entrenched. Should we control how images are used in the digital world? Telling and listening to stories, blending and reflecting on them may change what we believe is valuable and worthwhile.

Leave a Comment

Digital Pen Technology

Back in April 2011 I went to a Birds of the Feather event sponsored by BrightIdea, see here for my posting on that event. At the event one of the examples of an innovation was the use of digital pen technology to collect information on forms by workers in the field. I didn’t know what they meant by digital pens but soon learned that it was a combination of a pen and some special paper. The pen and paper let normal handwriting be captured and transferred to a digital format without any scanning.

I was intrigued because this is a particular pain point in many explicit knowledge management/document management projects I’ve worked on over the years. Trying to convince people to change from handwritten forms to online input forms is fraught with hardship and resistance. The arguments pro and con usually go something like this:

– The handwritten form is easy to use, can be signed for authorization and can then be retained as an authenticated final copy.

– The online form is difficult to use, requires a computer, a printer and a scanner and a whole lot of training.

– The handwritten form is difficult to share. We have to scan it and input the data manually and then verify the input. This slows down the workflow.

– We must have a printed and signed form for compliance purposes. We want a true, unchangeable authenticated copy so we will have to print the form, sign it and then scan it.

The core digital pen technology comes from the Swedish company Anoto. The technology has been licensed to other companies. For example, PaperIQ Digital Pen for Blackberry solution, and LiveScribe that produces digital pens combined with digital audio recorders.

In June 2011, I got in touch with Big Prairie Limited a local Hong Kong company that provide consulting services for Anoto based digital pens technology. Big Prairie has been working in the digital pen area in Asia for about ten years.

In exchange for helping out at the Big Prairie booth at the Greater China eHealth Forum 2011 on 7-8 October I was given a LiveScribe Echo 4GB digital pen. I didn’t agree to blog about the digital pen technology and no one from Big Prairie has reviewed this blog. Just to be clear, I got 1 pen, 3 notebooks and some software with a retail value of about HKD1,700 (about USD220) in exchange for 2 days of work at the forum. Clearly, I’m really a cheap date.

I prepared some demos and for two days I demonstrated the LiveScribe Echo digital pen. Subsequently, I’ve been using the pen almost every day and showing it around to my network. People are very impressed with the technology. The pen is a normal ball-point pen with a small camera and a digital recorder and player. The pen is lightweight and a bit big but still comfortable for writing. The pen can be used as just a normal ball-point pen or used with the Anoto dot paper to record what you are writing. The notebook is made from normal notebook paper which as been printed with thousands of very small blue-colored dots. These dots are locational markers; when combined with the camera your writing is recorded precisely and accurately. The digital recorder can be used stand-alone or it synchronizes with your handwritten notes. The notebook has printed commands at the bottom of the page that let you turn on, pause, turn off the audio recording, jump ahead in 10 second intervals, bookmark, jump to a % position, set the playback speed and adjust the volume of the playback. The small speaker on the pen is good enough for listening or you can buy some headphones that plug into the pen for both better listening quality and recording quality. You can play back an audio recording by simply touching any of the synchronized handwritten text.

I don’t want to go over all of the features of the LiveScribe pen because that information is easily available and better explained on the LiveScribe website.

The handwritten notes and recording can be easily exported to Adobe pdf format. This is called a pencast. I’ve made some pencast examples that illustrate what I think are the important points. Download these pencasts and open with Acrobat Reader version 9.3.2 or higher.

1. A pencast with handwritten notes and audio recording by listening to the BBC One-minute news broadcast. Pencast no 1 ….
– The black text has no synchronized recording.
– The green text has synchronized recording.
– Click on the green text to listen to the recording. The remaining green text will go light grey. You can jump around by clicking on the green or light grey text.

2. A pencast with handwritten notes, audio and a quick sketch. Added a few days’ later handwritten notes and audio recording from a BBC One-minute news broadcast. Pencast no 2 ….
– It captures the sketch easily and accurately.
– Notes and recordings can be added at different times.

3. A pencast of “Double Ninth: Missing my Shandong Brothers”, with handwritten Chinese characters, Putonghua pinyin and English translation. There is no audio recording. This is a poem that almost all Chinese children learn in school. Pencast no 3 ….
九月九日憶山東兄弟
獨在異鄉為異客,
每逢佳節倍思親.
遙知兄弟登高處,
遍插茱萸少一人.

As a lonely stranger in a strange land,
At every holiday my homesickness increases.
Far away, I know my brothers have reached the peak;
They are planting flowers, but one is not present.

“Double Ninth, Missing My Shandong Brothers”
— Wang Wei (王維), Tang Dynasty

4. A pencast with handwritten Japanese kana and kanji. Pencast no 4 …
– Kana means katakana and hiragana writing. Kanji mean Chinese characters.

Handwritten notes can be converted to text using the MyScriptforLiveScribe tool from a company named VisionObjects . It costs about USD30.00. It is very easy to use from within the LiveScribe Desktop.

Here are the results.
1. A pencast with handwritten notes and audio recording by listening to the BBC One-minute news broadcast converted to text and then saved as a JPG. Pencast no 5 …
– As a JPG file it retains the layout and format of the handwritten notes.

2. A pencast with handwritten notes and audio recording by listening to the BBC One-minute news broadcast converted to text and then saved as a DOCX. Pencast no 6 ….
– It loses the layout and the format.
– The text can be edited.
– The text has 401 written letters and 10 errors marked in red so the error rate was 2.49%. This is remarkable for conversion of handwritten text to type text. I did a project last year where scanning handwriting and then converting to type text the error rates was between 25% to 50%. See here for information on ICR, intelligent character recognition.

3. A pencast of “Double Ninth: Missing my Shandong Brothers”, Chinese characters, Putonghua pinyin and English translation. I’ve converted the Chinese characters to text, the Putonghua pinyin to text and the English translation to text and saved as a DOCX. Pencast no 7 ….
– There are 37 characters in the original text plus 2 characters of the author’s name. There are 6 errors marked in red. The error rate is 15.38%. Once again, this is remarkable for handwritten Chinese character to type text conversion.
– There are 199 handwritten letters and diacritic marks in the Putonghua pinyin. There are 29 errors marked in red. The error rate is 19.33%. It is high because MyScriptforLiveScribe doesn’t have a dictionary for Putonghua pinyin.
– There are 188 handwritten letters and punctuation marks in the English translation. There are 15 errors marked in red. The error rate is 7.98%.

4. A pencast of Japanese kana and kanji. This text only has hiragana and kanji characters converted to characters and then saved as a DOCX. Pencast no 8 ….
– 65 handwritten hiragana and kanji.
– 3 errors marked in red. The error rate is 4.62%. Once again, this is remarkable. Projects in Japan I’ve worked on with handwritten Japanese to text conversion are lucky to get the error rate down to 25%.

The LiveScribe solution is for personal note-taking. It is most useful for students, business people, professionals such as lawyers, doctors, advisors and anyone who takes notes which is just about anybody. Being able to record the audio along with the notes can significantly increase the value of the notes and can improve your ability to add to the notes later. Being able to create a PDF pencast makes the notes easy to share with others. Being able to convert them into type text makes it much easier to produce formal reports from handwritten notes. I recommend it highly for an effective and inexpensive solution for taking notes and keeping them digitally.

In Hong Kong, information on ordering a LiveScribe digital can be found on the Smartpen Asia website. The LiveScribe website has an international store locator. LiveScribe pens and accessories can also be ordered thru Amazon.

There are enterprise solutions enabling forms to be created and specialized workflows integrated to enterprise backend solutions using the Anoto digital pen technology. You should contact Big Prairie Limited or Anoto for more information.

Comments (2)

Chinese Character Normalization – Finding People in Greater China

For most of this past year I’ve been working on a project that involved searching for Chinese people in various online databases using their Romanized names or their Chinese character names. When you are searching for someone’s Chinese name inside a database there are some quite thorny issues. With the rise of China as the world’s 2nd largest economy and Chinese people traveling and spending more and more around the world these issues about identifying Chinese people by their names are going to become a part of many knowledge workers day-to-day tasks. Here is the definition of Greater China from Wikipeida.

Most of the time trying to find a Chinese person among many other Chinese people in a database by name is not very successful. Most of the problems are around ‘Romanization’ and ‘Simplification and Traditional Chinese characters’. If you are interested in ‘Romanization’ see this Wikipeida entry. The ‘Romanization’ problem is that there are simply too many methods and no real standard.

In mainland China, people are by law required to use ‘simplified’ characters for their names. This assumes that there is a ‘simplified’ character for that name. In Hong Kong, Macau and Taiwan people use ‘traditional’ characters for their names. If you are interested in the difference refer to this Wikipedia entry. In any event, ‘simplification’ is a master stroke of censorship and knowledge control by the mainland Chinese government. Mainland Chinese have difficulty reading books, pamphlets and newspapers from outside of China. What better way could there be of controlling knowledge than by changing the writing system people use every day? Conversely, people from Hong Kong, Macau and Taiwan have a difficult time reading ‘simplified’ characters. Some claim it is harder to go from ‘Simplified’ to ‘Traditional’ than from ‘Traditional’ to ‘Simplified’ but I’m not sure. Reading Chinese is always hard for me and I’ve learned both character sets, sort of, up to the 1,000 character mark.

However, since there are different character sets a problem arises when someone from mainland China comes to Hong Kong, Macau or Taiwan and start to use their written character name to open accounts at banks, shops, hotels and so on. The same happens when people from Hong Kong, Macau and Taiwan go to mainland China. Simply put, people can’t easily read this person’s name. The solution is to ‘transform’ the name into the ‘correct’ character set; ‘Simplified’ Character to ‘Traditional’ Character or ‘Traditional’ Character to ‘Simplified’ Character. It happens all the time when a person opens an account where there details will be input into a database. They write down their name in the character set they are comfortable with using and the person either collecting the names or the data-input person ‘transforms’ this name. Interestingly, all Hong Kong and Macau Chinese people may apply for a ‘home return permit‘ card that lets them cross the border into China easily, and also lets the Chinese government know they have arrived. Their names are always ‘transformed’ into simplified characters when there is corresponding character between the ‘traditional’ character and the ‘simplified’ character. I assume these transformations are more accurate than some of the others. I know some of the transformations between ‘simplified’ to ‘traditional’ are not always accurate. This is due to imperfect knowledge of the mapping rules between the character sets. Sometimes people are in too much a hurry so they simply guess. All Chinese names have at least 2 characters and many, maybe the majority, have 3 characters. Sometimes the transformer will transform 1 or 2 characters and leave 1 or 2 character unchanged.

The end result is that if even if you have a Chinese person’s correct name you may not be able to find it in a database because someone has ‘transformed’ the name. Sometimes you can’t find a Chinese person in a database because you believe their name is written with character ‘X’ but in fact they write it with character ‘Y’. The only way to solve this problem is for the database’s search engine to ‘normalize’ the search. Here is an excellent summary of ‘normalization’ prepared by Michael CY Chan.

Leave a Comment

Web 2.0 … The Machine is Us/ing Us

It still resonates ….

Leave a Comment

Older Posts »