Last week Bill Slawski covered a patent that has just been granted to Google, named “Determining spam in information collected by a source.” As it is immediately obvious from its name, the patent discusses methods for discovering spammy information coming from third-party sources. This seems to cover mostly (but apparently not limited to) business entities. Therefore, it would be safe to say that the patent discusses local citations and how a search engine might determine if the citation carries purposefully set incorrect information (spam). The two main ways, according to the patent, are:
- By measuring the “frequency of occurrence” of each phrase/element of the citation
- By measuring the “trustworthiness” of each source
I will not dabble further into the explanations of how each of these two factors are counted and how the whole system works, but I’d rather focus on the practical implications.
What It Tackles
Google obviously would like to present the most accurate and complete information to its users. This is possible only if it obtains this information from as many sources as possible. However, some of these sources might sometimes provide inaccurate or even spammy information, so Google needs methods to find it. Some obvious examples of information Google would rather completely disregard are telephone number in the business name, mentions of words such as “discount”, “sales”, etc. In other cases it might be more difficult for Google to understand if particular content is correct or not. An example would be “[Business Name] in [City]“, or even worse – “[City] [Business Name]” (as in Miami Printing, which is the actual name of a real business).
The patent gives examples mostly related to the category of the business, but I believe the practical implications are mostly related to the business name. It is a known and widely approved fact that the business name plays role in how Google ranks the local search results (see factors #15 and #22 here). That is why over the years many have adopted the bad practice (intentionally or not) to add extraneous keywords to the business name in their Google local listings. When Google started getting stricter, the “practitioners” (predominantly black hat SEOs) got smarter and started creating citations with the business name including the keywords. That way the third-party data would support the information the “business owner” submits via Google Places.
Reading through the patent, two major threats occur in my mind:
1) The main one is that Google seems to rely a lot (probably too much) on information coming from “trusted sources” (according to the patent a source can be designated “as a trusted source based on, for example, a reputation of the source or previous dealings with the source or combinations of them”). This means that it is theoretically possible that if a source is trustworthy enough, it is possible that Google might take the information from this source for granted and would never disregard it or check its accuracy. Examples of such sources would be LocalEze and Infogroup/CityGrid in the USA, and YellowPages.ca in Canada. Translated in local SEO language this means that it is possible that if a listing is added to LocalEze (for example) and the same business information is not found anywhere else on the web, Google might still create a new listing using this information. This obviously opens up a big hole in the described system, because Google would be very dependent on such third-party trusted sources. And it is important to mention that many of these potentially trusted sources have close to no mechanisms for checking the authenticity of the business information added to their databases (other than phone verifications, which is an insufficiently reliable method).
2) My other concern is related to businesses that actually do have such words (regarded as spam) in their business names, websites, or even physical addresses. How Google has historically been dealing with such situations is they were withholding the activation of a listing that contains such words (the biggest publicly available list of these is here) and they have manually been verifying the accuracy of the information. However, according to the patent even words such as city names could be considered spammy, which opens up a broad field for false positives.
What This Means from Local SEO Point of View
As mentioned above, there are two main factors taken into account – trustworthiness and frequency. While the patent doesn’t discuss these factors in regards with organic search rankings, it could be assumed that similar methodology is used when determining the value of citations and how business listings are ranked. This means that we could distinguish between two types of local citation sources:
1. Qualitatively-important – such as the aforementioned LocalEze, Infogroup, CityGrid, Yellowpages, etc.
2. Quantitatively-important – either less authoritative or less-probable-to-be-citation-sources sources.
To have a strong “citation profile”, you must cover the first type of sources, and just after this you should proceed with looking for further volumes and opportunities. At the same time while 2. could be useless without 1., 1. would be (in many cases) insufficient without 2.
I believe Google has been using this (or similar) method to detect and disregard spammy information coming from third-party sources at least for some time. Nevertheless, the patent comes to shed further light on the systems Google adopts to find, compile, and process business information.