June 16th, 2011

How We Have Attempted to Recover from Google Panda

BACKGROUND

My main site, TravBuddy, was hit in February by Google’s latest “Panda" algorithm. This post isn’t really meant to complain about the issues we’ve noticed - we’ll save that for a later post ;) - but to openly talk about some of the things we’ve done to try to recover.

First off, a little background about TravBuddy. We are a large online travel community with approximately 1.7MM members, 4MM travel photos, and hundreds of thousands of travel reviews and blogs. In addition to the travel information sharing component, people use the site to find travel buddies and meet other travelers, a fact that has even led to quite a few happy marriages around the world. We have a very healthy, active community which we are quite proud of.

We’ve also been a Webby Honoree for the last 4 years in a row and featured in publications like the BBC World News, NBC Nightly News, and Budget Travel Magazine. In other words, we’ve had a lot of external signals that we are running a high quality site, and were very unexpectedly affected by Panda.

In terms of SEO we’ve always tried to stay within the Webmaster Guidelines. We’ve never purchased any links (extremely rare in online travel) or done anything shady. We hired an SEO firm once, but it didn’t help at all, and we’ve decided to stay focused on simply creating a good experience for our users and hoping the traffic would follow. That strategy worked fine for the last 5 years, until Panda.

WHAT WE’VE DONE

Enough looking back - let’s go over what we’ve done since Feb. to try to get back in Google’s good graces. 

  • Removed Many Thin Content Pages - By thin content I mean any pages that had little or no useful information. For instance, our locations database on our site has about 1.3 million different locations, but probably less than 100,000 of those are ever traveled to. We’ve noindexed pages that had very little content, and 404’d pages that had no content at all. You may ask why we didn’t do this a long time ago - the fact is, we never got much traffic to these pages anyway, didn’t link much to them internally, and they Google seemed to ignore them anyway, so it didn’t seem like a problem.

    We also basically noindexed nearly 4 MILLION of our photo pages. Our users upload many, many travel photos, and these pages seemed to take the biggest hit after Panda. Basically, they were just a photo + a caption. Did Google think that they were thin content, even though they were useful for our users? Did they simply make up too large a portion of our site, overwhelming all the other, textual content? Who knows, but the fact of the matter is that they didn’t bring us much search engine value anyway, so we decided not to risk it.

  • Noindexed All Duplicate Content - We have a database of about 200,000 hotels in our site. Maybe 10% of them have actual user reviews, while the other 90% are “stub” pages that have generic descriptions from hotel owners. This is pretty common in the online travel industry, except that most travel sites have 100% duplicate content. It doesn’t make sense to not link to these pages at all - since visitors already on our site want to see all the hotel options available for a given location.

    With that said, I can certainly see how a search user would not want to see the same hotel description from hundreds of multiple sites. Even though we have a large amount of unique content (20,000+ hotel reviews), perhaps the duplicate content was outweighing it in Google’s eyes, so it made sense to focus on where we can provide unique value.

  • Improved Internal Navigation - This goes along with the points above. Rather than linking to every page we have in our site, we’ve tightened our linking structure to link to and highlight the original, quality content we do have.

  • Removed Many Advertisements From The Site - This one was kind of a shot in the dark, but it’s clear from looking at search results in the travel niche that Google is heavily favoring travel sites with no advertisements, even if they have no content otherwise.

  • Tried to Remove External Duplicate Content - We have thousands of sites that scrape our content, including many who steal it only to rank higher than us (and, to add insult to injury, monetize their sites with AdSense). It’s impossible to keep up with them all, and after you’ve submitted multiple spam reports for the same domains for years to Google, and received no responses and no action for years, you begin to lose faith in any manual system of trying to stop scrapers. 

    What we have done is to try to to limit the amount of external duplicate content that we can control. For example, we used to syndicate snippets of our hotel reviews to Kayak. After noticing that they ranked higher than us for our own reviews, even when they linked directly to us after each snippet, we asked them to take the reviews down. This wasn’t something we had to worry about even a few months ago, and it’s unfortunate that our reviews aren’t getting the exposure they could be getting.

RESULTS AND ANALYSIS

Many of these changes seem like no-brainers in afterthought, and I’d agree with anyone who says we should have done them a long time ago. However, they were never really a problem before - many “thin” pages didn’t get much traffic anyway, and Google would always do a great job of finding our best pages and returning the best results from our site regardless. And the pages certainly weren’t malicious in intent. But, after Panda, the mere fact of having them exist was apparently enough to destroy the rest of the quality content we did have.

It’s like a library deciding not to carry ALL books by a particular author, because he wrote one bad book in his youth that nobody ever read anyway.

That’s not to say our site is perfect. It’s not, and like any user generated content site, there are going to be high quality submissions and low quality ones. But I know for a fact that we do have a lot of high quality, unique information that is better than anything else we see out there (more on that in a later post).

We’ve been rolling out these changes over the last 3 months, and have yet to see ANY changes in our SE traffic. This is despite basically a massive overhaul of our sites internal linking, noindexing/404’ing many millions of pages, and significant layout changes. In fact, our traffic has been remarkably consistent, almost as if Google has decided we should get X number of visitors every week, regardless of what we do. This is especially odd given the seasonality of online travel, which fluctuates greatly even without algorithmic changes. I’ve never seen anything like it in 6 years of working on the site.

So what next? What else can we do to improve our website? How long do we have to wait in order to see any changes? Since we used to have millions of pages indexed, I can only hope that it’s simply a matter of Google needing enough time to sort out all the changes we’ve made.

Stay tuned for more posts on this topic..