Update 07 Oct 2011: Google announced the launch of single-language labels in Google Maps, thus providing the opportunity to check all the objects, countries, cities only in English. At the same time they fixed the name of Bulgaria in Bulgarian and now it shows up correctly once again! It took about a month.
About a week ago I discovered an interesting fact – the ZIP codes for the state of Arizona, USA, were all outdated on Google Maps. I searched for more information on that matter and found Steven Christensen of IRON-ARTz who has dedicated the last few months of his life to researching the issue. I publish the main text of an email he sent me without edition:
I’ve spent the last 6 months plus working on resolving accuracy and deficiancy issues with our address verification and location systems in order to eliminate as much of the descrepency overhead we currently are generating. Over this period I’ve seen a lot of issues that have accumulated through the chain of sources both on the user side as well as on the source side. While working toward providing corrective measures I’ve developed a number of conclusions based off the results I have accumulated. To summarize here is what I have found to be trends in the chain from source to end user with this particular topic. I think that we can safely infer that these are issues in all forms of data delivery systems.
The US post office currently issues notices for changes that they make in their address system. Data providers on the web have not instituted these changes even though the notifications are freely available. If you review the Arizona zip code changes and check these changes on the web I think you’ll find that most of the data providers have not incorporated the changes. There are very few that seem to realize that this information can change at all. The US Post office search on their website does reflect these changes. Since these updates where given a year long transition period the changes should have been incorporated in every system by this time as they officially expired the pre-change zip codes as of July 2010 I believe.
The US post office also has guidelines in place that have specific formats for addresses. Since the post office essentially manages their data on a local basis they have user input locally is not regulated and they have internally violated their own guidelines. This presents a number of challenges. How do you handle data that is input from an organization that repeatedly violates its own guidelines? If the organization cannot be trusted on to resolve their internal issues how can we expect that other companies will compensate for this or even how can we NOT expect these companies to exacerbate the problem by introducing their own internal issues?
At this point in time I’ve found that virtually all data is at least 10% inaccurate in very obvious ways that would take very little effort to correct. There are companies that have based their whole existence off correcting these problem from the user side and they, at the very least, have no incentive to correct this at the source. It would make them obsolete. The companies receiving this information are focused on their core missions. They don’t have time or resources to dedicate to this problem primarily because they don’t realize the amount of money that is lost in direct and indirect costs.
Without naming any particular companies I can tell you that these problems are at the core of almost every business that utulizes this information. When you consider the costs of such things as shipping corrections and rerouting, tax rate resolution, geo location lookup services for businesses, bulk mailing services and returns, a descrepency rate approaching 15% leaves a tremendous amount of money on the table from an efficiency standpoint.
Below is my take on this situation as it currently stands.
Programmers design based primarily for computer system specifications.
This fails to account for human behavior. There is serious lack of understanding in the development process that the designs are meant to be utilized by people and programming projects for interfaces should ALWAYS be designed from the perspective of the end user as the primary concern.
With the addressing system there is a prevailing tendency that accuracy has been well resolved in the industry. Users input data with misspellings, transposed letters and numbers, make assumptions about addresses based on non-authoritative sources, believe that the systems should know every slang, abbreviation or shorthand that has been used in the past or will be used in the future, and finally that all of the systems are expected to have 100% of the most current data as well as the same level of accuracy.
With this in mind the obvious solution to this part of the problem is to develop with the expection that user input will always be suspect and an effort to resolve the requests with flexibility in place should be a primary concern.
Third party deliverables are prone to inaccuracies and contractual and pricing negotiations should take this into account.
The best source of information will always be the originator of that information. With each level of intermediary introduced their should be a percentage based expectation of anomalies and inaccuracies inherited with the handling of information. Quality control in the electronic information distribution channels rarely seems to be as high a priority as features and functions. Over time this has led to an ever-increasing descrepency rate and accountability appears to be stagnant. From my perspective these third party companies decrease the valuation of their offerings with every update and reduce the overall value for the market segment they are serving.
After the last round of updates I received proved to be of very poor quality I have shifted to updating internally, using resources close to the orginating sources of the data. At this point I see little value in third party sources due to the quality issues that are currently in place. There must be a move toward more quality control pushed up the chain over the long term to make these business models sustainable.
Online services need to be internally managed to all for better control.
The Google issue with updates for the Arizona area leave that area no option other than to use inaccurate data or map to the historical data that was in place before the changes took effect. I would caution anyone that is developing to rely primarily on their ability to resolve physical locations to the generally accepted latitude and longitude options and present these to the Google API rather than rely on their accuracy. This also allows for a much more flexible system for their company when conficts arise with a particular vendor or service.
I notified Google on that matter, but up to now (more than a week later) I have not received any reply. It seems the problem with inaccurate data on Google Maps for a whole state is not important enough.
And if this is not major enough, a country name spelled erroneously should already ring some bells. And I get this personally as that country is Bulgaria. The correct spelling should be “България”, but currently the one on Google Maps is “Болгария” – the spelling in Russian. To all the US readers – imagine that one day you go on G Maps and instead of “United States” you read “Соединенные Штаты” – quite insulting, is it not?