July 22nd, 2011

Google Panda - Why Brands Aren’t Always Best and the Decline of the Web Ecosystem

Full Disclosure: I am the founder of TravBuddy.com, a large, independent travel community. Our site was affected by Google Panda, and I will highlight some content from my niche (because those are the easiest examples available to me), but this post isn’t about my site vs Google. This is about my personal opinion on the terrible state of Google’s search results, and why it’s bad for everyone.

Google Panda, a large update to Google’s search algorithm released 5 months ago, is supposed to be about surfacing “quality” and “original” websites. In my humble opinion, it has utterly failed in the travel niche, to the great cost of the consumer.

Why should you or anyone else care? Because if you ever want to research your travels online (and almost all travel searches still start at Google), your results are going to be littered almost entirely with homogenous, duplicate content or low quality websites, and it will be harder for you to find information and advice from independent sources.

HOMOGENOUS, DUPLICATE CONTENT

One of Google’s stated goals with Panda was to reduce “duplicate” content and promote sites that “provide original content or information, original reporting, original research, or original analysis”. Five long months later, here is the kind of diversity and original content we can find on the first page of the results:

TripAdvisor is a good website with tons of original content, but there is no reason why Google should rank 3 of their domains that have the EXACT same content - (tripadvisor.co.uk, tripadvisor.ca, tripadvisor.com.au) and two of their domains (holidaywatchdog.com, virtualtourist.com - both owned by TripAdvisor) that have almost exactly the same content - all on the front page.

What this means to the consumer is 4.5 / 10 results on the first page of Google effectively point to the same exact content. Users are getting less useful information than ever before.

LOW QUALITY WEBSITES

Also, consider the fact that two of remaining results are occupied by Google owned properties that provide no additional value to the user. They are exactly the type of websites that Panda professes to destroy.

Result #2: The Google Places page - a mish-mash of ads, booking links, and scraped photos. Plus a short, unhelpful review from a Google User named “Marketing Concept” which is most likely spam.

Result #5: A link to a video on YouTube (which is owned by Google, by the way) that is nothing more than a slideshow of low resolution, scraped images found on hundreds of other sites. Hardly “original” or “providing substantial value”.

The end result for the user, for the traveler looking for useful or helpful information about this hotel, is that 75% of the content / links on the front page are spam. Unfortunately, this is not a unique example and can be repeated across many hotel/travel related queries on Google.

WHY BRANDS AREN’T ALWAYS BEST

Google has been trumpeting “Brands, Brands, Brands” for the last year as a solution against declining search quality. There is nothing fundamentally wrong with this approach, but taken to its extreme it results in less relevant, spammy results.

Taking the query above for “Riad Dar Najat Hotel Marrakech”:

The first result is a link to the official hotel website. Here is an example of the brand emphasis working. I think most people would agree that a brand should rank first for its name.

Next, nearly half of the results are from the TripAdvisor brand. TripAdvisor is a good site and a strong brand, but here the emphasis on brands has gone incredibly wrong. We have 4-5 links to the exact same content, providing no additional value to the user. Looks spammy to me.

Then, two of the results are for Google’s own properties. No brand is stronger than Google on the Internet but, just looking at things from the consumer perspective - how does linking to a Google “Places” page with no helpful or original content, or a YouTube video with no helpful or original content, ever help the end user? Looks spammy to me.

THE SLOW DECLINE OF THE WEB ECOSYSTEM

When Google launched Panda, they openly stated that:

"Google depends on the high-quality content created by wonderful websites around the world, and we do have a responsibility to encourage a healthy web ecosystem. Therefore, it is important for high-quality sites to be rewarded, and that’s exactly what this change does.”

In fact, I would argue that Panda, combined with Google’s insistence on brand hegemony, is in fact causing tremendous damage to the web ecosystem.

The search results are now dominated by powerful brands, often with low quality or duplicate pages, often at the expense of search quality and user experience.

Independent sites and personal blogs with unique information are much more difficult to find now. The first page of results now has 3-4 “real” results, when before it used to have 10.

The travel searcher loses out because 75% of the results they are getting are basically spam or regurgitating the same information.

The web loses out because thousands independent websites honestly attempting to provide unique, high-quality information are not getting any feedback on why their sites were penalized. They are going out of business while Google turns a deaf, hypocritical ear, and brand spam continues to pollute the results.

THE FUTURE

Google faces a tremendously difficult problem with spam, and I don’t claim to have any easy answers. I wish I could say “Just use Bing”, but their results often suffer from the same brand blindness (10/20 of the first results link to the same content).

What I do know, though, is that brands are not always the answer, especially when they crowd out a diversity of independent, higher-quality information.

The message that Google is sending is that “duplicate”, “unoriginal”, and “low quality” content will get you penalized - unless you are a large brand or Google itself.

I also know that the web-ecosystem is suffering tremendously as a result, because, like it or not, for most people the web is Google.

June 17th, 2011

Website Feedback

I’ve gotten a lot of good feedback from HN about the first post. Here are some additional tips that people mentioned:

USE HTTP CODE 410 GONE INSTEAD OF 404 FOR PAGE REMOVAL

HTTP status code 410 lets Google know that your page is permanently gone. 404 only tells Google that the page cannot be found. Apparently, a 410 status code is better for getting Google to remove pages from its index, so we’re starting to use this to de-index our “thin” content.

RETHINKING FOCUS OF SITE

One thing that many people who viewed the site focused on was our “Hotel Search” landing page. There was a lot of confusion about what our site was actually about, and some people felt that our sitemap links looked kind of spammy. Many people focused on this tab even though our site had a wealth of other content. So we’ve decided to remove our “Hotels” tab and instead provide a better experience / canonical hierarchy by consolidating our navigation to the “Destinations” tab.

Our feeling is that you only get one chance to make an impression on most visitors. This should provide a better experience for users by better highlighting the content we do have (tons of other travel related content in addition to hotel reviews), and hopefully give Google a clearer idea on how to categorize our site (by location first and foremost).

Thanks all for your feedback!

June 16th, 2011

How We Have Attempted to Recover from Google Panda

BACKGROUND

My main site, TravBuddy, was hit in February by Google’s latest “Panda" algorithm. This post isn’t really meant to complain about the issues we’ve noticed - we’ll save that for a later post ;) - but to openly talk about some of the things we’ve done to try to recover.

First off, a little background about TravBuddy. We are a large online travel community with approximately 1.7MM members, 4MM travel photos, and hundreds of thousands of travel reviews and blogs. In addition to the travel information sharing component, people use the site to find travel buddies and meet other travelers, a fact that has even led to quite a few happy marriages around the world. We have a very healthy, active community which we are quite proud of.

We’ve also been a Webby Honoree for the last 4 years in a row and featured in publications like the BBC World News, NBC Nightly News, and Budget Travel Magazine. In other words, we’ve had a lot of external signals that we are running a high quality site, and were very unexpectedly affected by Panda.

In terms of SEO we’ve always tried to stay within the Webmaster Guidelines. We’ve never purchased any links (extremely rare in online travel) or done anything shady. We hired an SEO firm once, but it didn’t help at all, and we’ve decided to stay focused on simply creating a good experience for our users and hoping the traffic would follow. That strategy worked fine for the last 5 years, until Panda.

WHAT WE’VE DONE

Enough looking back - let’s go over what we’ve done since Feb. to try to get back in Google’s good graces. 

  • Removed Many Thin Content Pages - By thin content I mean any pages that had little or no useful information. For instance, our locations database on our site has about 1.3 million different locations, but probably less than 100,000 of those are ever traveled to. We’ve noindexed pages that had very little content, and 404’d pages that had no content at all. You may ask why we didn’t do this a long time ago - the fact is, we never got much traffic to these pages anyway, didn’t link much to them internally, and they Google seemed to ignore them anyway, so it didn’t seem like a problem.

    We also basically noindexed nearly 4 MILLION of our photo pages. Our users upload many, many travel photos, and these pages seemed to take the biggest hit after Panda. Basically, they were just a photo + a caption. Did Google think that they were thin content, even though they were useful for our users? Did they simply make up too large a portion of our site, overwhelming all the other, textual content? Who knows, but the fact of the matter is that they didn’t bring us much search engine value anyway, so we decided not to risk it.

  • Noindexed All Duplicate Content - We have a database of about 200,000 hotels in our site. Maybe 10% of them have actual user reviews, while the other 90% are “stub” pages that have generic descriptions from hotel owners. This is pretty common in the online travel industry, except that most travel sites have 100% duplicate content. It doesn’t make sense to not link to these pages at all - since visitors already on our site want to see all the hotel options available for a given location.

    With that said, I can certainly see how a search user would not want to see the same hotel description from hundreds of multiple sites. Even though we have a large amount of unique content (20,000+ hotel reviews), perhaps the duplicate content was outweighing it in Google’s eyes, so it made sense to focus on where we can provide unique value.

  • Improved Internal Navigation - This goes along with the points above. Rather than linking to every page we have in our site, we’ve tightened our linking structure to link to and highlight the original, quality content we do have.

  • Removed Many Advertisements From The Site - This one was kind of a shot in the dark, but it’s clear from looking at search results in the travel niche that Google is heavily favoring travel sites with no advertisements, even if they have no content otherwise.

  • Tried to Remove External Duplicate Content - We have thousands of sites that scrape our content, including many who steal it only to rank higher than us (and, to add insult to injury, monetize their sites with AdSense). It’s impossible to keep up with them all, and after you’ve submitted multiple spam reports for the same domains for years to Google, and received no responses and no action for years, you begin to lose faith in any manual system of trying to stop scrapers. 

    What we have done is to try to to limit the amount of external duplicate content that we can control. For example, we used to syndicate snippets of our hotel reviews to Kayak. After noticing that they ranked higher than us for our own reviews, even when they linked directly to us after each snippet, we asked them to take the reviews down. This wasn’t something we had to worry about even a few months ago, and it’s unfortunate that our reviews aren’t getting the exposure they could be getting.

RESULTS AND ANALYSIS

Many of these changes seem like no-brainers in afterthought, and I’d agree with anyone who says we should have done them a long time ago. However, they were never really a problem before - many “thin” pages didn’t get much traffic anyway, and Google would always do a great job of finding our best pages and returning the best results from our site regardless. And the pages certainly weren’t malicious in intent. But, after Panda, the mere fact of having them exist was apparently enough to destroy the rest of the quality content we did have.

It’s like a library deciding not to carry ALL books by a particular author, because he wrote one bad book in his youth that nobody ever read anyway.

That’s not to say our site is perfect. It’s not, and like any user generated content site, there are going to be high quality submissions and low quality ones. But I know for a fact that we do have a lot of high quality, unique information that is better than anything else we see out there (more on that in a later post).

We’ve been rolling out these changes over the last 3 months, and have yet to see ANY changes in our SE traffic. This is despite basically a massive overhaul of our sites internal linking, noindexing/404’ing many millions of pages, and significant layout changes. In fact, our traffic has been remarkably consistent, almost as if Google has decided we should get X number of visitors every week, regardless of what we do. This is especially odd given the seasonality of online travel, which fluctuates greatly even without algorithmic changes. I’ve never seen anything like it in 6 years of working on the site.

So what next? What else can we do to improve our website? How long do we have to wait in order to see any changes? Since we used to have millions of pages indexed, I can only hope that it’s simply a matter of Google needing enough time to sort out all the changes we’ve made.

Stay tuned for more posts on this topic..

Introduction

I’m not much for blogging, but I’ve recently had some thoughts that I wanted to get out there (mainly around business). While the first few posts will likely center around that theme, hopefully I’ll continue to blog about things I find interesting.