Chapter 6

Spam Issues: When Search Engines Get Fooled

In This Chapter

  • Finding out about the different types of search engine spam
  • Understanding the consequences of using spam
  • Avoiding search engine penalties
  • Being wary of guaranteed results and other false promises
  • Understanding ethical SEO

When you hear the word spam, what comes to mind? You probably think of all those annoying emails with their poorly worded and often obscene messages that clutter your inbox daily. That’s spam, all right, but there’s another kind of spam that’s directed at search engines.

In this chapter, you find out about spam techniques that some websites use to fool or trick the search engines into delivering a higher listing on the results page. Any time you think you can achieve higher rankings by deceiving the search engines, you’d better think again! Google and the other engines get better all the time at sniffing out spam, and the penalties can be harsh. Even inadvertent spam can get a website in trouble, so in this chapter we go over some of the more popular and dangerous methods that have been used. Then we delve into the guidelines search engines use to define what they consider spam, as well as our search engine optimization (SEO) code of ethics to help keep you and your site in the clear.

Understanding What Spam Is

When you normally think of spam, the first thing that comes to mind is either the canned meat product or the junk email that’s clogging up your inbox. (Or the Monty Python skit … “Spam, spam, spam, spam” … ahem.) When we here in SEO-land talk about spam, however, we mean something a little different than meat by-products, unwanted emails, or British comedy troupes. Search engine spam (also sometimes known as spamdexing) is any tactic or web page that is used to deceive the search engine into a false understanding of what the whole website is about or its importance. It can be external or internal to your website, it may violate the search engines’ policies directly, or it may be a little bit sneakier about its misdirection. How spam is defined depends on the intent and extent. What is the intent of the tactic being used, and to what extent is it being used?

If you stuff all your metadata (text added into the HTML of a page describing it for the search engine) full of keywords (words or phrases relating to your site content that search engines use to determine whether it’s relevant) with the sole intent of tricking the search engine so that your page will receive a higher page rank on the results page, that’s spam. Also, if you do that all over your website, with your Alt attribute text (text used to describe an image for the search engine to read), your links, and keywords, trying to trick the search engine spider (the little programs that search engines use to read and rank websites) into giving you the highest rank possible, it’s a little harder to claim to the search engine that it was simply an accident and it was done out of ignorance.

Most technologies that are used in the creation, rendering, and design of websites can be used to trick the search engines. When a website tries to pull a fast one, or the search engines even so much as perceive it did, the search engines consider that website spam. Search engine companies do not like spam. Spam damages the reputation of the search engine. They’re working their hardest to bring you the most relevant results possible, and spam-filled pages are not what they want to give you. Users might not use the search engine again if they get spammy results, for starters. So if someone’s caught spamming, that person’s site could be penalized or removed entirely from the search engine’s index (the list of websites that the search engine pulls from to create its results pages).

You can report spam if you run across it by contacting the search engines (see the “Reporting Spam” section near the end of the chapter).

Discovering the Types of Spam

In the following sections, we talk a little about what types of spam there are in SEO-land and what not to do in order to keep your site from getting penalized or even pulled out of the engines by accident.

Spam is any attempt to deceive the search engines into ranking a page when it does not deserve to be ranked. In the following sections, we describe spam that is known to be detected and punished by the search engines.

warning Do not attempt any of the discussed methods, because they will result in your site being branded as a spammer. This chapter is not meant to cover every type of spam out there on the web. It’s just meant to give you the knowledge you need to recognize when a tactic might be venturing down the wrong path. Spammers use other advanced techniques that may also be detectable by the search engines, so avoid any attempt to deceive the search engines.

Hidden text/links

One of the more obvious ways to spam a site is to insert hidden text and links in the content of the web page (the content of a site being anything that the user can see). All text has to be visible to the user on the site. Hidden content can be defined as text that appears within the rendered HTML code that is not visible on the page to the user without requiring user-interaction in order to see it. Hidden text can simply be a long list of keywords, and the hidden links increase the site’s popularity. Examples of using hidden text and links are

  • White text/links on a white background: Putting white text and links on a white background renders the text invisible to the user unless the text is highlighted by right-clicking on the mouse. Spammers can then insert keywords or hyperlinks that the spiders read and count as relevant.
  • Text, links, or content that is hidden by covering it with a layer so that it is not visible: This is a trick that people use with CSS. They hide spiderable content under the page that can’t be seen with the naked eye or by highlighting the page.
  • Positioning content off the page's view with CSS: This is another programming trick spammers use.
  • Links that are not clickable by the user: Creating a link that has only a single 1-x-1 pixel as its anchor, that uses the period on a sentence as the anchor, or that has no anchor at all. There's nothing for a user to click, but the engine can still follow the link.

Using invisible or hidden text is a surefire way to get your site banned so that it no longer shows up in the engines. The reasoning behind this is that you would want all your content visible to the user, and any hidden text is being used for nefarious purposes.

Figure 6-1 shows what we mean by hidden text on a background. Usually, you find this as white text on a white background, but it could be any color as long as it's not visible to a user (black on black, gray on gray, and so on). This is spam and will get your site banned.

image

Figure 6-1: An example of white text on a white background.

Doorway pages

A doorway page is a web page submitted to search engine spiders that has been designed to satisfy the specific algorithms for various search engines but is not intended to be viewed by visitors. Basically they do not earn the rankings but instead deceive the search engines into rankings by design and keyword-stuffing tricks that you'd never want to put on a page for a user to see. Doorway pages are there to spam the search engine index (the database of information from which search engines draw their primary results) by cramming it full of relevant keywords and phrases so that it appears high on the results page for a particular keyword, but when the user clicks it, he or she is automatically redirected to another site or page within the same site that doesn't rank on its own.

Doorway pages are there only for the purpose of being indexed, and there is no intention for anyone to use those pages. Sometimes more sophisticated spammers build a doorway page with viewable, relevant content in order to avoid being caught by the search engine, but most of the time a doorway page is made to be viewed only by a spider. Doorway pages are often used in tandem with deceptive redirection, which we discuss in the following section.

Deceptive redirection

Has this ever happened to you? You do a search for a cartoon you used to love as a kid, and you click one of the links on the results page. But instead of the page you were expecting, you get an entirely different website, with some very questionable content. What just happened? Behold the headache that is deceptive redirection. Deceptive redirection is a type of coded command that redirects the user to a different location than what was expected via the link that was clicked.

Spammers create shadow page/domains that have content that ranks for a particular search query (the words or phrase you type into the search text box), yet when you attempt to access the content on the domain, you are redirected to a shady site (often having to do with porn, gambling, or drugs) that has nothing to do with your original query.

The most common perpetrators of deceptive redirects are also a spam method: doorway pages. Most doorway pages redirect through a Meta refresh command (a method of instructing a web browser to automatically refresh the current web page after a given time interval). Search engines are now issuing penalties for using Meta refresh commands, so other sites will trick you into clicking a link or using JavaScript (a computer programming language) to redirect you. Google now considers any website that uses a Meta refresh command or any other sneaky redirect (such as through JavaScript) to be spam.

Not all redirects are evil. The intent of the redirect has to be determined before a spam determination can be made. If the page that you are redirected to is nothing like the page expected, it is probably spam. If you get exactly what you expect after a redirect, it probably isn't spam. We discuss a lot more about redirects in Book VII, Chapter 3.

Cloaking

Another nefarious form of spam is a method called cloaking. Cloaking is a technique in which the content presented to the search engine spider is different from that presented to the user’s browser, meaning that the spiders see one page while you see something entirely different. Spammers can cloak by delivering content based on the IP addresses (information used to tell where your computer or server is located) or the User-Agent HTTP header (information describing whether you’re a person or a search engine robot) of the user requesting the page. When a user is identified as a search engine spider, a server-side script delivers a different version of the web page, one that contains content different from the visible page. The purpose of cloaking is to deceive search engines so they display the page when it would not otherwise be displayed.

Like redirects, cloaking is a matter of intent rather than always being evil. There are many appropriate uses for this technique. News sites use cloaking to allow search engines to spider their content while users are presented with a registration page. Sites selling alcohol require users to verify their age before allowing them to view the rest of the content, while search engines pass unchallenged.

Unrelated keywords

Unrelated keywords are a form of spam that involves using a keyword that is not related to the image, video, or other content that it is supposed to be describing in the hopes of driving up traffic. Examples include putting unrelated keywords into the Alt attribute text of an image, placing them in the metadata of a video, or placing them in the Meta tags of a page. Not only is it useless, but it also gets your site pulled if you try it.

Keyword stuffing

Keyword stuffing occurs when people overuse keywords on a page in the hopes of making the page seem more relevant for a term through a higher keyword frequency or density. Keyword stuffing can happen in the metadata, Alt attribute text, and within the content of the page itself. Basically, going to your Alt attribute text and typing porsche porsche porsche porsche over and over again is not going to increase your ranking, and the page will likely be yanked due to spam.

tip There are also much sneakier methods of using keyword stuffing: using hidden text in the page, hiding large groups of repeated keywords on the page (usually at the bottom, far below the view of the average visitor), or using HTML commands that cause blocks of text to be hidden from user sight.

Link farms

You might envision a “link farm” as a pastoral retreat where docile links graze in rolling green pastures, but alas, you would be wrong. A link farm is any group of websites that hyperlink (a link to another part of the website) to all the other sites in the group. Remember how Google loves links and hyperlinks and uses them in its algorithm to figure out a website’s popularity? Most link farms are created through automated programs and services. Search engines have combated link farms by identifying specific attributes that link farms use and filtering them from the index and search results, including removing entire domains to keep them from influencing the results page.

remember Not all link exchange programs are considered spam, however. Link exchange programs that allow individual websites to selectively exchange links with other relevant websites are not considered spam. The difference between these link exchange programs and link farms is the fact that the site is selecting links relevant to its content, rather than just getting as many links as it can get to itself.

Reporting Spam

Fighting spam is a top priority for the search engines. Google alone has a squadron of PhDs who do nothing but identify and combat spammers and their techniques. Fighting spam is important to Google because its business depends on presenting reliable, relevant results when you search. This is why its spam filters are getting better all the time.

The major search engines have posted quality guidelines to spell out what webmasters should and shouldn’t do — stuff like avoiding hidden text or hidden links, not loading pages with irrelevant keywords, and so forth. The search engines also encourage people to submit a spam report about sites that violate their quality guidelines and cross the line into spam. You should report spam when you see it. Eliminating search engine spam makes the world of SEO a fairer place, and searchers around the world get better results.

Google

Google has two ways to submit a spam report:

  • Registered Search Console users can submit an authenticated spam report form at www.google.com/webmasters/tools/spamreport?pli=1. Google promises to investigate every spam report submitted by a registered Search Console user.
  • Anyone can fill out an unauthenticated spam report form located at www.google.com/contact/spamreport.html. Google reportedly assesses every unauthenticated report in terms of its potential impact and investigates “a large fraction” of these reports, as well.

Figure 6-2 shows the many categories of spam report forms that are available in Google.

image

Figure 6-2: Google provides report forms for many categories of spam.

Bing

Bing doesn’t have a spam report form at a specific URL, but there is a way to report spam nonetheless. Click the Feedback link either in the lower-right corner of Bing.com, or in the footer of any Bing results page. You can type your complaint in a simple text box, as shown in Figure 6-3. Be sure to mention “spam” in your message, click Dislike, and provide the essential details, like the URL and the query you used, so that Bing can research the issue.

image

Figure 6-3: You can report spam using the feedback form in Bing.

Avoiding Being Evil: Ethical Search Marketing

We didn’t spend this chapter describing spam just so that unscrupulous users could run out and use it. Sure, the spam might bump their page rank for a little while, but they will be caught, and their site will be penalized or pulled from the index entirely. So why use it?

For too long, many SEO practitioners were involved in an arms race of sorts, inventing technology and techniques in order to achieve the best rankings and get the most clients. Unfortunately, some developed more and more devious technology to trick the search engines and beat the competition. Thus we have two types of techniques used in SEO:

  • White hat: This includes all SEO techniques that fall into the ethical realm. White-hat techniques involve using relevant keywords, descriptive Alt attribute text, simple and clear metadata, and so on. White-hat techniques clearly comply with the published intent of the various search engine quality guidelines.
  • Black hat: These are the SEO techniques we describe in this chapter (among others that we haven't covered). Black-hat techniques are sneaky and devious, and they attempt to game the engines to promote content not relevant to the user. These techniques are deceptive and generally break (or at least stretch) the search guidelines, commonly leading to spam penalties that are painful at best and devastating at worst.

remember With the search engines implementing aggressive antispam programs, the news is out: If you want to get rankings, you have to play well within the rules. And those rules are absolutely “No deception or tricks allowed.” Simply put, honest relevancy wins at the end of the day. All other approaches fade away.

Generally, the search engines all adhere to a code of conduct. Little things do vary from search engine to search engine, but the general principle is the same:

  • Keywords should be relevant, applicable, and clearly associated with page body content.
  • Keywords should be used as allowed and accepted by the search engines (placement, color, and so on).
  • Keywords should not be utilized too many times on a page (frequency, density, distribution, and so on). The use should be natural for the subject.
  • Redirection technology (if used) should facilitate and improve the user experience. But understand that this is almost always considered a trick and is frequently a cause for penalties or removal from an index.
  • Redirection technology (if used) should always display a page where the body content covers the expected topic and contains the appropriate keywords (no bait and switch).

You can get back into a search engine’s good graces after getting caught spamming and penalized or yanked out of the index. It involves going through your site and cleaning it up, removing all the spam issues that caused it to get yanked in the first place, and resubmitting your pages for placement into the index. Don’t expect an immediate resubmission, though. You have to wait in line with everyone else.

Search engine penalties

The search engines tweak their algorithms all the time in a continuous effort to improve the quality of search results. Google has said it makes more than 500 changes a year — that’s more than once per day! Many changes are minor, but others aggressively attack one form of spam or another, causing major consequences for websites trying to rank. When the dust settles, both winners and losers emerge.

Within the SEO industry, we call any sudden and noticeable demotion in search engine ranking a penalty. Penalties can be assigned either manually or as the result of an algorithm change, but the resulting drop in traffic and revenue feels the same to the website owner. Search engines have human quality raters who can review a website and assign a manual action if they find that the site violates their quality guidelines. That’s what search engines call a “penalty.” But sites can also get hit with an algorithmic penalty when an algorithm change redefines what’s okay to do and they are suddenly caught outside the new stricter boundaries. Just as in musical chairs when the music stops, a site that has been happily playing the game can suddenly find itself without a place to sit in the SERPs.

Google’s major algorithm updates have resulted in massive algorithmic penalties (as well as an arguably much cleaner SERP). For some reason, the updates are usually named after cute black-and-white animals. You find out more about how to avoid these penalties individually in their appropriate topic sections throughout this book. Table 6-1 lists the whole menagerie and explains the types of spam tactics or low-quality content each update targets.

Table 6-1 Major Google Penalty-Related Updates

Update

Release Dates

Purpose

Panda

February 2011; several subsequent updates

Reward quality content and penalize sites with thin or shallow content

Penguin

May 2012; periodic updates every 6–12 months

Penalize sites that have link spam or too many low-quality links to a site

Page Layout

January 2012

Penalize sites with too much advertising above the fold

Payday Loans

June 2013; two updates in 2014

Target spammy sites and queries, such as [payday loans], [casinos], [viagra]

Pigeon

July 2014

Improve local search results in Google Maps and Web searches

Realizing That There Are No Promises or Guarantees

Say that you know that you won't use spam in order to increase your page ranking in the search engines. You understand that it’s unethical and is more trouble than it’s worth. But at the same time, you need to increase your page rank. The simple solution is to hire an SEO organization to do the optimizing for you. But beware: Although you might not use spam, there’s a chance than an unscrupulous SEO practitioner will.

A code of ethics applies to people in the search engine optimization industry. Beware of those who promise or guarantee results to their clients, allege a special relationship with a search engine, or advertise the ability to get priority consideration when they submit to a search engine. People who make these claims are usually lying. Remember, there is no way to pay your way into the top of the search results page. Yahoo does have a program called Search Submit Pro where, for a fee, you can submit your page and be guaranteed that the Yahoo spider will crawl your site frequently, but Yahoo doesn’t guarantee rankings, and it’s the only large engine with this sort of program (see Chapter 2 of this minibook for more details). Also avoid those that promise link popularity schemes or promise to submit your site to thousands of search engines. These do not increase your ranking, and even if they do, it’s not in a way that would be considered positive, and the benefits, if any, are usually short-lived.

warning Unfortunately, you are responsible for the actions of any company you hire. If an SEO company creates a web page for you using black-hat tactics, you are responsible and your site could be pulled entirely from the search engine’s index. If you're not sure about what your SEO company is doing, ask for clarification. And remember, like in all things, caveat emptor. Buyer beware.

Following the SEO Code of Ethics

The discussion of any SEO code of ethics is like a discussion on politics or religion: There are more than two sides, all sides are strongly opinionated, and seldom do they choose the same path to the same end. Most search engine optimization (SEO) practitioners understand this code of ethics, but not all practitioners practice safe SEO. Too many SEO practitioners claim a bias toward surfers, or the search engines, or their clients (all are appropriate in the correct balance), and it is common for the SEO pros to use the “whatever it takes” excuse to bend some of the ethical rules to fit their needs. This does not pass judgment; it simply states the obvious.

Although the industry as a whole hasn’t adopted an official code of ethics, the authors of this book have drafted a specific code that we pledge to adhere to with respect to our clients. We have paraphrased this code here, but you can read the original at www.bruceclay.com/web_ethics.htm:

  • Do not intentionally do harm to a client. Be honest with the client and do not willfully use technologies and methods that are known to cause a website’s removal from a search engine index.
  • Do not intentionally violate any specifically published and enforced rules of search engines or directories. This also means keeping track of when policies change and checking with the search engine if you’re unsure of whether the method or technology is acceptable.
  • Protect the user visiting the site. The content must not mislead, no “bait and switch” tactics (where the content does not match the search phrase) should be used, and the content should not be offensive to the targeted visitors.
  • Do not use the continued violation of copyright, trademark, servicemark, or laws related to spamming as they may exist at the state, federal, or international level.
  • All pages presented to the search engine must match the visible content of the page.
  • Don’t steal other people’s work and present it as your own.
  • Don’t present false qualifications or deliberately lie about your skills. Also, don’t make guarantees or claim special relationships with the search engine.
  • Treat all clients equally and don’t play favorites.
  • Don’t make false promises or guarantees. There is no such thing as a guaranteed method of reaching the top of the results page.
  • Always offer ways for your clients to settle disputes. There will be competition among your clients’ websites. Make sure there’s a way to mediate conflict if it ever comes up.
  • Protect your clients’ confidentiality and anonymity of your clients with regard to privileged information and any testimonials supplied by your clients.
  • Work to the best of your ability to honestly increase and retain the rankings of your client sites.

In a nutshell? Don’t be evil. Spammers never win, and winners never spam. What works in the short term won't work forever, and living in fear of getting caught is no way to run a business.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.47.25