Chapter 6
In This Chapter
When you hear the word spam, what comes to mind? You probably think of all those annoying emails with their poorly worded and often obscene messages that clutter your inbox daily. That’s spam, all right, but there’s another kind of spam that’s directed at search engines.
In this chapter, you find out about spam techniques that some websites use to fool or trick the search engines into delivering a higher listing on the results page. Any time you think you can achieve higher rankings by deceiving the search engines, you’d better think again! Google and the other engines get better all the time at sniffing out spam, and the penalties can be harsh. Even inadvertent spam can get a website in trouble, so in this chapter we go over some of the more popular and dangerous methods that have been used. Then we delve into the guidelines search engines use to define what they consider spam, as well as our search engine optimization (SEO) code of ethics to help keep you and your site in the clear.
When you normally think of spam, the first thing that comes to mind is either the canned meat product or the junk email that’s clogging up your inbox. (Or the Monty Python skit … “Spam, spam, spam, spam” … ahem.) When we here in SEO-land talk about spam, however, we mean something a little different than meat by-products, unwanted emails, or British comedy troupes. Search engine spam (also sometimes known as spamdexing) is any tactic or web page that is used to deceive the search engine into a false understanding of what the whole website is about or its importance. It can be external or internal to your website, it may violate the search engines’ policies directly, or it may be a little bit sneakier about its misdirection. How spam is defined depends on the intent and extent. What is the intent of the tactic being used, and to what extent is it being used?
If you stuff all your metadata (text added into the HTML of a page describing it for the search engine) full of keywords (words or phrases relating to your site content that search engines use to determine whether it’s relevant) with the sole intent of tricking the search engine so that your page will receive a higher page rank on the results page, that’s spam. Also, if you do that all over your website, with your Alt attribute text (text used to describe an image for the search engine to read), your links, and keywords, trying to trick the search engine spider (the little programs that search engines use to read and rank websites) into giving you the highest rank possible, it’s a little harder to claim to the search engine that it was simply an accident and it was done out of ignorance.
Most technologies that are used in the creation, rendering, and design of websites can be used to trick the search engines. When a website tries to pull a fast one, or the search engines even so much as perceive it did, the search engines consider that website spam. Search engine companies do not like spam. Spam damages the reputation of the search engine. They’re working their hardest to bring you the most relevant results possible, and spam-filled pages are not what they want to give you. Users might not use the search engine again if they get spammy results, for starters. So if someone’s caught spamming, that person’s site could be penalized or removed entirely from the search engine’s index (the list of websites that the search engine pulls from to create its results pages).
You can report spam if you run across it by contacting the search engines (see the “Reporting Spam” section near the end of the chapter).
In the following sections, we talk a little about what types of spam there are in SEO-land and what not to do in order to keep your site from getting penalized or even pulled out of the engines by accident.
Spam is any attempt to deceive the search engines into ranking a page when it does not deserve to be ranked. In the following sections, we describe spam that is known to be detected and punished by the search engines.
One of the more obvious ways to spam a site is to insert hidden text and links in the content of the web page (the content of a site being anything that the user can see). All text has to be visible to the user on the site. Hidden content can be defined as text that appears within the rendered HTML code that is not visible on the page to the user without requiring user-interaction in order to see it. Hidden text can simply be a long list of keywords, and the hidden links increase the site’s popularity. Examples of using hidden text and links are
Using invisible or hidden text is a surefire way to get your site banned so that it no longer shows up in the engines. The reasoning behind this is that you would want all your content visible to the user, and any hidden text is being used for nefarious purposes.
Figure 6-1 shows what we mean by hidden text on a background. Usually, you find this as white text on a white background, but it could be any color as long as it's not visible to a user (black on black, gray on gray, and so on). This is spam and will get your site banned.
A doorway page is a web page submitted to search engine spiders that has been designed to satisfy the specific algorithms for various search engines but is not intended to be viewed by visitors. Basically they do not earn the rankings but instead deceive the search engines into rankings by design and keyword-stuffing tricks that you'd never want to put on a page for a user to see. Doorway pages are there to spam the search engine index (the database of information from which search engines draw their primary results) by cramming it full of relevant keywords and phrases so that it appears high on the results page for a particular keyword, but when the user clicks it, he or she is automatically redirected to another site or page within the same site that doesn't rank on its own.
Doorway pages are there only for the purpose of being indexed, and there is no intention for anyone to use those pages. Sometimes more sophisticated spammers build a doorway page with viewable, relevant content in order to avoid being caught by the search engine, but most of the time a doorway page is made to be viewed only by a spider. Doorway pages are often used in tandem with deceptive redirection, which we discuss in the following section.
Has this ever happened to you? You do a search for a cartoon you used to love as a kid, and you click one of the links on the results page. But instead of the page you were expecting, you get an entirely different website, with some very questionable content. What just happened? Behold the headache that is deceptive redirection. Deceptive redirection is a type of coded command that redirects the user to a different location than what was expected via the link that was clicked.
Spammers create shadow page/domains that have content that ranks for a particular search query (the words or phrase you type into the search text box), yet when you attempt to access the content on the domain, you are redirected to a shady site (often having to do with porn, gambling, or drugs) that has nothing to do with your original query.
The most common perpetrators of deceptive redirects are also a spam method: doorway pages. Most doorway pages redirect through a Meta refresh command (a method of instructing a web browser to automatically refresh the current web page after a given time interval). Search engines are now issuing penalties for using Meta refresh commands, so other sites will trick you into clicking a link or using JavaScript (a computer programming language) to redirect you. Google now considers any website that uses a Meta refresh command or any other sneaky redirect (such as through JavaScript) to be spam.
Not all redirects are evil. The intent of the redirect has to be determined before a spam determination can be made. If the page that you are redirected to is nothing like the page expected, it is probably spam. If you get exactly what you expect after a redirect, it probably isn't spam. We discuss a lot more about redirects in Book VII, Chapter 3.
Another nefarious form of spam is a method called cloaking. Cloaking is a technique in which the content presented to the search engine spider is different from that presented to the user’s browser, meaning that the spiders see one page while you see something entirely different. Spammers can cloak by delivering content based on the IP addresses (information used to tell where your computer or server is located) or the User-Agent HTTP header (information describing whether you’re a person or a search engine robot) of the user requesting the page. When a user is identified as a search engine spider, a server-side script delivers a different version of the web page, one that contains content different from the visible page. The purpose of cloaking is to deceive search engines so they display the page when it would not otherwise be displayed.
Like redirects, cloaking is a matter of intent rather than always being evil. There are many appropriate uses for this technique. News sites use cloaking to allow search engines to spider their content while users are presented with a registration page. Sites selling alcohol require users to verify their age before allowing them to view the rest of the content, while search engines pass unchallenged.
Unrelated keywords are a form of spam that involves using a keyword that is not related to the image, video, or other content that it is supposed to be describing in the hopes of driving up traffic. Examples include putting unrelated keywords into the Alt attribute text of an image, placing them in the metadata of a video, or placing them in the Meta tags of a page. Not only is it useless, but it also gets your site pulled if you try it.
Keyword stuffing occurs when people overuse keywords on a page in the hopes of making the page seem more relevant for a term through a higher keyword frequency or density. Keyword stuffing can happen in the metadata, Alt attribute text, and within the content of the page itself. Basically, going to your Alt attribute text and typing porsche porsche porsche porsche over and over again is not going to increase your ranking, and the page will likely be yanked due to spam.
You might envision a “link farm” as a pastoral retreat where docile links graze in rolling green pastures, but alas, you would be wrong. A link farm is any group of websites that hyperlink (a link to another part of the website) to all the other sites in the group. Remember how Google loves links and hyperlinks and uses them in its algorithm to figure out a website’s popularity? Most link farms are created through automated programs and services. Search engines have combated link farms by identifying specific attributes that link farms use and filtering them from the index and search results, including removing entire domains to keep them from influencing the results page.
Fighting spam is a top priority for the search engines. Google alone has a squadron of PhDs who do nothing but identify and combat spammers and their techniques. Fighting spam is important to Google because its business depends on presenting reliable, relevant results when you search. This is why its spam filters are getting better all the time.
The major search engines have posted quality guidelines to spell out what webmasters should and shouldn’t do — stuff like avoiding hidden text or hidden links, not loading pages with irrelevant keywords, and so forth. The search engines also encourage people to submit a spam report about sites that violate their quality guidelines and cross the line into spam. You should report spam when you see it. Eliminating search engine spam makes the world of SEO a fairer place, and searchers around the world get better results.
Google has two ways to submit a spam report:
www.google.com/webmasters/tools/spamreport?pli=1
. Google promises to investigate every spam report submitted by a registered Search Console user.www.google.com/contact/spamreport.html
. Google reportedly assesses every unauthenticated report in terms of its potential impact and investigates “a large fraction” of these reports, as well.Figure 6-2 shows the many categories of spam report forms that are available in Google.
Bing doesn’t have a spam report form at a specific URL, but there is a way to report spam nonetheless. Click the Feedback link either in the lower-right corner of Bing.com, or in the footer of any Bing results page. You can type your complaint in a simple text box, as shown in Figure 6-3. Be sure to mention “spam” in your message, click Dislike, and provide the essential details, like the URL and the query you used, so that Bing can research the issue.
We didn’t spend this chapter describing spam just so that unscrupulous users could run out and use it. Sure, the spam might bump their page rank for a little while, but they will be caught, and their site will be penalized or pulled from the index entirely. So why use it?
For too long, many SEO practitioners were involved in an arms race of sorts, inventing technology and techniques in order to achieve the best rankings and get the most clients. Unfortunately, some developed more and more devious technology to trick the search engines and beat the competition. Thus we have two types of techniques used in SEO:
Generally, the search engines all adhere to a code of conduct. Little things do vary from search engine to search engine, but the general principle is the same:
You can get back into a search engine’s good graces after getting caught spamming and penalized or yanked out of the index. It involves going through your site and cleaning it up, removing all the spam issues that caused it to get yanked in the first place, and resubmitting your pages for placement into the index. Don’t expect an immediate resubmission, though. You have to wait in line with everyone else.
The search engines tweak their algorithms all the time in a continuous effort to improve the quality of search results. Google has said it makes more than 500 changes a year — that’s more than once per day! Many changes are minor, but others aggressively attack one form of spam or another, causing major consequences for websites trying to rank. When the dust settles, both winners and losers emerge.
Within the SEO industry, we call any sudden and noticeable demotion in search engine ranking a penalty. Penalties can be assigned either manually or as the result of an algorithm change, but the resulting drop in traffic and revenue feels the same to the website owner. Search engines have human quality raters who can review a website and assign a manual action if they find that the site violates their quality guidelines. That’s what search engines call a “penalty.” But sites can also get hit with an algorithmic penalty when an algorithm change redefines what’s okay to do and they are suddenly caught outside the new stricter boundaries. Just as in musical chairs when the music stops, a site that has been happily playing the game can suddenly find itself without a place to sit in the SERPs.
Google’s major algorithm updates have resulted in massive algorithmic penalties (as well as an arguably much cleaner SERP). For some reason, the updates are usually named after cute black-and-white animals. You find out more about how to avoid these penalties individually in their appropriate topic sections throughout this book. Table 6-1 lists the whole menagerie and explains the types of spam tactics or low-quality content each update targets.
Table 6-1 Major Google Penalty-Related Updates
Update |
Release Dates |
Purpose |
Panda |
February 2011; several subsequent updates |
Reward quality content and penalize sites with thin or shallow content |
Penguin |
May 2012; periodic updates every 6–12 months |
Penalize sites that have link spam or too many low-quality links to a site |
Page Layout |
January 2012 |
Penalize sites with too much advertising above the fold |
Payday Loans |
June 2013; two updates in 2014 |
Target spammy sites and queries, such as [payday loans], [casinos], [viagra] |
Pigeon |
July 2014 |
Improve local search results in Google Maps and Web searches |
Say that you know that you won't use spam in order to increase your page ranking in the search engines. You understand that it’s unethical and is more trouble than it’s worth. But at the same time, you need to increase your page rank. The simple solution is to hire an SEO organization to do the optimizing for you. But beware: Although you might not use spam, there’s a chance than an unscrupulous SEO practitioner will.
A code of ethics applies to people in the search engine optimization industry. Beware of those who promise or guarantee results to their clients, allege a special relationship with a search engine, or advertise the ability to get priority consideration when they submit to a search engine. People who make these claims are usually lying. Remember, there is no way to pay your way into the top of the search results page. Yahoo does have a program called Search Submit Pro where, for a fee, you can submit your page and be guaranteed that the Yahoo spider will crawl your site frequently, but Yahoo doesn’t guarantee rankings, and it’s the only large engine with this sort of program (see Chapter 2 of this minibook for more details). Also avoid those that promise link popularity schemes or promise to submit your site to thousands of search engines. These do not increase your ranking, and even if they do, it’s not in a way that would be considered positive, and the benefits, if any, are usually short-lived.
The discussion of any SEO code of ethics is like a discussion on politics or religion: There are more than two sides, all sides are strongly opinionated, and seldom do they choose the same path to the same end. Most search engine optimization (SEO) practitioners understand this code of ethics, but not all practitioners practice safe SEO. Too many SEO practitioners claim a bias toward surfers, or the search engines, or their clients (all are appropriate in the correct balance), and it is common for the SEO pros to use the “whatever it takes” excuse to bend some of the ethical rules to fit their needs. This does not pass judgment; it simply states the obvious.
Although the industry as a whole hasn’t adopted an official code of ethics, the authors of this book have drafted a specific code that we pledge to adhere to with respect to our clients. We have paraphrased this code here, but you can read the original at www.bruceclay.com/web_ethics.htm
:
In a nutshell? Don’t be evil. Spammers never win, and winners never spam. What works in the short term won't work forever, and living in fear of getting caught is no way to run a business.
18.116.47.25