STC: mayo 2011

martes, 24 de mayo de 2011

Stone Temple Consulting (STC) Articles and Interviews on SEO Topics

Posted: 23 May 2011 10:03 AM PDT

An In Depth Look at Panda Signals

photo of Rand Fishkin

Rand Fishkin is the CEO & Co-Founder of the web's most popular SEO Software provider; SEOmoz. Together with Eric Enge, he co-authored the Art of SEO from O'Reilly Media and was named on the 40 Under 40 List and 30 Best Young Tech Entrepreneurs Under 30. Rand has been written about in The Seattle Times, Newsweek and PC World among others and keynoted conferences on search around the world. He's particularly passionate about the SEOmoz blog, read by tens of thousands of search professionals each day. In his miniscule spare time, Rand enjoys the company of his amazing wife, whose serendipitous travel blog chronicles their journeys.

Interview Transcript

Eric Enge: The discussion topic for the day is Panda signals. What was Google trying to do with Panda?

Rand Fishkin: My opinion is that Panda is the start of something relatively new for Google. They are using the aggregated opinions of their quality raters, in combination with machine learning algorithms, to filter and reorder the results for a better user experience.

That's a mouthful, but essentially what it means is that Google has this huge cadre of human workers who search all the time and rate what they find. What they want to do is find ways to show things they like and suppress things they don't like. Google and has previously been reticent to do this across the board and use it as a primary signal, and have historically used this data only as a quality control check on the algorithms they write.

Now, I think they are being more aggressive and trying this out on a certain type of site. Panda impacts more than 11% of queries which is a robust change; although, I don't think it is the largest change we have seen from Google's algorithm over the last few years. I like the direction they are going in, but I get the sense they don't know what is in the algorithm at this point.

Machine Learning Algorithms

Eric Enge: Because they have implemented a machine learning based algorithm.

Yes, that's a problem anytime you implement a machine learning technique. Machine learning takes a bunch of predictive metrics and uses a neural network, or some other machine learning model, to try and come up with a best fit to the desired result. I think one reason machine learning is slowly making its way into Google's algorithmic updates is they are uncomfortable with not knowing what is in the algorithm.

It's not as if you target specific sites like Ezinearticles and eHow, but sites that the quality raters identified as fitting into the eHow profile. The challenge is to find metrics that will push those sites down, but keep deserving sites high.

The machine learning algorithm will search across all data points it can, but it may use weird derivatives, for example, the number of times the page uses the letter x may have a super high correlation to whether people didn't like its quality so the machine learning algorithm pushes down pages that use the letter x. That's not an actual example but you get my point.

You can no longer dig into the code and figure out which engineer coded into the algorithm that the letter x in pages means lower rankings. An engineer did not do it, the machine learning system did it. So, you know they have to be careful with how they implement it.

I got the sense from the Wired interview and other writings that even Amit and Matt were a little nervous about how this works. I think they recognized that they hit some sites unintentionally. The most frustrating part for them is that they don't know why the algorithm hit sites they didn't want to.

Eric Enge: You have to go back and try tuning some of the ratings of various parameters and see how it comes out.

Rand Fishkin: Yes, but it is so much harder to tune a parameter when you don't know what the parameters are. This means you have to retrain the model and test it rather than just change a particular parameter.

Eric Enge: Yes, the overhead of going through the whole process is much greater.

Rand Fishkin: You can't simply say "we would like to boost back up these five sites." The reality is unless you rewrite the whole system you can't individually boost up one site's ranking. What do you do? Turn up a "this site is good" knob? I don't think they have one of those.

Eric Enge: No, I bet they don't. You talked about their quality raters, but they also have other signals. For example, the Chrome Blocklist extension.

Rand Fishkin: Yes, originally they did not use the Blocklist in the Panda update. It didn't make the original cut, which was in late February, but they later announced they were using the data from the Blocklist, so it got into the later releases. It's not just the Blocklist, Google now has a ton of user and usage data, and a much better representative sample than they ever had before.

Eric Enge: We also now have in the search results themselves a direct way to block a result, so it looks like Google is expanding upon this initiative.

Social Data and Google

Eric Enge: Let's dig a bit deeper into the social initiatives at Google.

Rand Fishkin: The newest initiative is the +1 buttons.

Eric Enge: The +1 is the beginning of their counter attack on Facebook.

Rand Fishkin: I disagree with that characterization because I don't think it is necessarily a direct attack on Facebook. I think +1 is a way Google hopes to learn more about what people enjoy, support and want to share, and learn what people want to see more of in their search results in addition to something like the Block List.

Eric Enge: Didn't Larry Page make the bonus of every Google employee dependent on the success of that feature?

Rand Fishkin: Dependent on Google's social success not specifically on only that feature, but yes, he did.

I think Google is clearly saying we need lots of social data, but I am hesitant to say +1 is about competing with Facebook. If you want to compete with Facebook you need a social network, you need something where people share photos of their kids and connect with each other and that kind of thing. Google is going a different route which is "tell us what you like in your search results and we will show you more of that and less of what you don't like."

Eric Enge: Yes, I agree that the +1 by itself isn't enough to counter Facebook, but it seems to be a piece of a larger puzzle that is emerging.

Rand Fishkin: The +1 button is definitely a competitor to using the Like and Share data which Google almost certainly is doing. They publicly said they use data from Facebook, but they weren't specific about what data. Then our correlation data came out showing the Share as being massively well-correlated for an individual metric, even controlling for links. I think that strongly suggests Google is getting value out of leveraging that Facebook data so they want to protect that source and make sure Facebook doesn't cut them off from it.

Eric Enge: As a minor aside, I saw speculation that since the Like button does largely what the Share button does, they may be planning to retire Share and go with the Like.

Rand Fishkin: I heard rumors of that as well. I would be surprised if they did that because Like and Share do very different things. Even though Facebook's message is that Like is very similar to Share, it is not.

I am sure you noticed inside Facebook they offer both Top News and Most Recent. By default they show you the Top News rather than the Most Recent so you rarely see things that your friends Like on their Wall, unless many people Liked it, but you will always see everything that your friends Share. (Note: if you enter in a description when you Like something it behaves exactly the same as a Share, but Rand is assuming here that many users will not do that)

The Share button is more like a link behavior than a Like button. Share is a much more robust action. As a Facebook administrator, you can look at Facebook Insights and see the percentage of Share impressions you get from your network and a Share always carries more value. Another big difference, from Google's point of view, is that many people click the Like button all over the place, but the Share button is more intentional, not just I Like this, but I want everybody else to see this. It is much more like a link behavior than the Like button is.

Eric Enge: Yes, so it will be interesting to see what happens with that.

Rand Fishkin: You have the new Send button too.

Eric Enge: Soon you will have no room for your content because your page will be covered with Facebook buttons.

Rand Fishkin: Please no.

Eric Enge: Let us talk about other signals Google could be using at this point.

Rand Fishkin: Google and Bing both have data deals with Facebook. The Facebook growth team, which is their marketing team, was at SEOmoz a couple of weeks ago and we were talking in-depth. There was NDA stuff I can't go into, but one thing they noted, that is public, is that Google gets considerably less data about the social graph from Facebook than Bing does. However, they get more than what is just in the open graph API.

When you talk about signals, I think Google is able to see deeper into the social graph via Facebook data than any of us can test on our own. Many people have concern around abuse. For example, what if I get ten thousand random people to go Like my page. There are probably very good signals about the authenticity of social sharing that Google is able to get through Facebook, and Bing maybe even more so.

User Interaction With Your Site

Eric Enge: What about other kinds of signals, like user interaction with the search results themselves?

Rand Fishkin: I thought Bill Slawski from SEOByTheSea had a great post about a Bing patent application. It looked at all sorts of user and usage data that I think Google is thinking about leveraging in some way.

Things like time on the site and whether people print the page. The pages printed tend to indicate the site must be high quality or that people have high interaction with that page. Also, if you scroll down, back up and around a lot that indicates there is some level of positive interaction on the page.

In terms of search activity, do you come back to the search results and perform different searches, do you come back to the search results and click on other results, or do you perform the same search at a different engine, which of course Bing and Google will both know through their click stream data. These types of aggregate tracking metrics across the board will help the search engines determine if they are doing a good job and if users are satisfied with what they are providing, and if individual results and individual sites are delivering the goods.

Eric Enge: The more mundane things, such as bounce rates, time on site, page views per visitor and number of repeat visitors, are good basic metrics as well. All these can be fed into the kind of inputs that go into a machine learning engine.

Neural Networks

Rand Fishkin: Yes, absolutely. You can feed all these metrics into a neural network and tell it what you would like it to produce and get nice metrics back. Create an amalgamation of all the data pieces and then use them as a metric in your algorithm.

Our data scientist at SEOmoz, Matt Peters, thinks, based on our correlation data, the main Google algorithm doesn't have that many metrics. Matt thinks two hundred signals might actually be on the high end of what they are using.

I think they find a few signals in each sector they like and then concentrate on making those better from the services side. Rather than taking fifty million metrics about a link, build one great metric about links that takes into account many things.

I thought that was kind of fascinating to think about. Maybe that is good from a computation and processing standpoint, as well as the speed of the results and a data usage, in terms of what the search engine has to do when they calculate ranking.

Eric Enge: Maybe they use other sets of metrics for quality verification rather than direct signals.

Rand Fishkin: That could be. That way you don't have to calculate it across every search performed.

Eric Enge: Right. You apply the greater set of metrics on your test set of some number of thousands, or tens of thousands, of sites or whatever you want it to be. Then, you do your curve fitting with the machine learning algorithm against that. It is fascinating to think about that whole process, about how they do these things.

Advertising and Panda

Eric Enge: Can you talk about advertising as a signal in Panda?

Rand Fishkin: The advertising thing is looming large. In our study of Google results recently, we looked at ad placement and ad size, number of ads on a page and total pixel coverage, and these had a prominent and obviously non-zero negative correlation with ranking. So, Google is essentially saying, "If you have big blocks of Google AdSense on your page, on average your rankings in the SERPs will be lower". I think it is good that Google is not reinforcing their own feature. I was presenting this in Australia and a Google search quality engineer in the room was quite happy to see that data reported.

It is a positive thing from a perception standpoint for Google, and it is also an indication that people who are extremely aggressive with advertising likely took a little bit of bath in this update.

Eric Enge: Yes, I tweeted an article earlier today that described an interaction between a marketer and Google about getting his AdWords account reinstated as he had AdSense campaigns on his sites that did not meet Google's criteria.

The detail that Google gave him on how they do their rating was astonishing. Basically, they did specific comparisons of the percentage of content versus the percentage of ads above the fold on a 1024x768 screen, as well as other metrics.

The guidance was that the percentage of ads could not be greater than the percentage of content. Then they detailed what they were defining as content and it was fascinating reading. Of course, this was about his Adsense account getting banned, but you can expect that they would be doing a very similar thing in organic SEO evaluation. Another thing I heard they look at is the click through rate on the ads.

Rand Fishkin: If it is high, that's an indication of two things: you either have high quality relevant advertising or, on the other side, you do not have worthwhile content on your page and all people can do is click your ads. I think they look for manipulation on the advertising, for example, pop overs or ads that blend in and fool you into thinking they are part of the links in the content, but they will also look to see if this is a good AdSense publisher.

Eric Enge: So a high CTR could be a good signal or a bad signal.

Rand Fishkin: Yes.

Eric Enge: That's clearly the case. But for most publishers, they have to worry if they have AdSense well below the fold in their page and if it is getting two-tenths of a percent click through rate, is that a good thing?

Rand Fishkin: It depends.

Eric Enge: I saw another post, again this is AdSense related but it gives signals as to how Google thinks about quality, and this post said when your click through rate on a page is low, your payout can vary dramatically. For example, imagine that you have a one percent click through rate across four ads, and one of them is at two percent, and the rest are at 0.2%. It may be that you can remove three of the ads and lift your click through rate to two percent, you might actually make more money even though the ad with a two percent click through rate still gets the same number of clicks. The other three ads were simply dragging down your payout per click.

What Makes a Good Quality Page?

Eric Enge: Can you you talk about the make up of a high quality page?

Rand Fishkin: One component is content block formatting. This is where they look at the usability and the user experience of how content is formatted on a page. For example, advertising or other things might be interrupting the content. Having that content be easily consumable in a friendly way seems to be a positive signal. I don't know if that is a feature of the machine learning and the quality rater stuff or something they independently grade, but it definitely seems to be a part of it.

Eric Enge: You could imagine that human quality raters would respond well to that. It may not explicitly be part of the algorithm but it may fall out of the algorithm.

Rand Fishkin: That's what is so interesting now and why SEO takes on a broader focus if Google is going to continue going in this direction. Essentially, everything that makes your page good for humans will help it to rank better and that is really exciting.

Eric Enge: It is exciting. Another thing is that affiliate links aren't inherently bad, but pages that have a large percentage of links that are affiliate links seems to have a negative correlation.

Rand Fishkin: Unfortunately, that wasn't something we could measure, but I know there has been a lot of circumstantial evidence around it. In my opinion there is a high correlation between affiliate websites which often have generic, similar, low-quality, low-value ad content, and lower rankings in this update.

Eric Enge: There are definitely some issues for many people in those businesses. At one point Google said something about a weak set of pages on your site potentially dragging down the whole site. I think it was Amit that said that.

Rand Fishkin: We have seen that a lot. In fact, a couple of Seattle startups were on a thread in the SEOmoz Q&A recently and talked about how they lost a lot of Google traffic, but weirdly the traffic they lost was to their lowest converting, lowest value pages.

In one case the hit was due to display advertising. In the other case, which was more of a direct conversion path, they hadn't lost that much. Google might take thirty percent of your traffic, but for some it might be the thirty percent they didn't really need.

Eric Enge: That is certainly not harmful, but it suggests if you have a chunk of weak pages dragging down the site, you should NoIndex them and that may help you recover.

Rand Fishkin: I would be interested to see that because I am not sure if Google is sophisticated enough to do the separation between what is indexed and not, and to distinguish what is intentionally put on a site and not. They might be dragging down sites that have these pages even if they are not indexed.

You might have to block with robots.txt or put a password protect on them or just find another way to keep the robot from getting to them. If I were running one of those businesses, I would be doing some testing.

Eric Enge: I think that's a good idea. I am also concerned about this notion that you need to find some weak pages and fix them, because I think most of the people that were hit by this have a broader problem than a cluster of weak pages. They are not all going to be eHows where, as Aaron Wall suggests, it is a branding issue and the algorithm had to be tweaked to figure out how to get them.

The eHow business model was clearly rooted in a certain amount of manipulation. For example, you don't need twelve articles on the same topic with slight variations in the search phrases written by different people. You could see why that would be objectionable from a search quality perspective.

Rand Fishkin: Absolutely.

Does User Generated Content Help a Page Rank Better?

Eric Enge: You published an article in SEOmoz about the correlation between user generated content sites fairing reasonably well in Panda even in the absence of much of a link profile.

Rand Fishkin: That was not from our correlation data. I believe it came from an analysis that Tom Critchlow wrote about from personal experience. That is what I have been seeing and feeling for the most part with a couple of exceptions. I have seen places where people have what I have call thin UGC, and they seemed to get hit.

LinkedIn is a great example of a big winner in the Panda update with its relatively robust user-generated content. That has been a great way to do long tail for a long time, and I think it will continue going forward, but there is a quality bar that has to be met.

Eric Enge: I think one could try to simulate user-generated content, but you probably wouldn't get the desired effect, because I think the real-time stream of things taking place is important, that has to be part of that mix, don't you think? You can't just throw up ten user-generated content samples on a bunch of pages and say we have user-generated content but nothing else goes up for another six months.

Rand Fishkin: There needs to be some naturalness to those signals. You think about content that gets contributed on LinkedIn, Quora, Facebook or Flyshare. There is an authenticity that is connected to real people who have real profiles elsewhere. A number of sharing activities go on in these little communities, and some percentage of it receives comments from other people, so here are many associated signals the engines can be looking at.

What's not in Panda

Eric Enge: What have we left out that they might be looking at discussing?

Rand Fishkin: One thing that is interesting and extremely frustrating is what Google left out of the update, which is any type of devaluation of manipulative linking.

Traffic Signal

Eric Enge: Yes, My last column on Search Engine Land was on that topic.

It's called the Speculating on the Next Shift in Google Search Algorithms. It was exactly about that.

Rand Fishkin: I think it infuriates both of us, the way black hat and grey hat manipulative link acquisition strategies work tremendously well in tons and tons of areas and they are becoming more bread and butter. I think after the aggressive spam policing and punishment of link buying exercises that Google went through from '06 to '08, the last two to three years have been a wasteland in terms of addressing these paid links that claim to be directories or the side bars of blogs.

Eric Enge: Exactly. In my article, I did provided details of the kind of links that one site in the coupon space was using. They are prospering, and they have hit counters, they have blog posts with the classic three anchor text rich links, and I actually show a sample of a footer link where the link, the anchor text rich link, is literally six inches below the last usable text from the page.

Rand Fishkin: It is really disturbing. I think the most frustrating thing is that people who are new to SEO, or people who have been doing SEO for a little while, might think they don't want to do any black hat kind of strategies; however, everyone in the top ten is doing it and the top three seem to be really effective at it, and maybe there are only a few brands here and there that do white hat and still succeed, but my clients are asking me for results so I have to do this.

It is easy to find link profiles of your competitors and see that those links are working and test them out. It's horrifying because it creates a bunch of people who believe in the value and power of black and grey hat SEO, and a bunch of rich people who are selling those links. It creates this whole marketplace around this type of stuff which, in my opinion, tends to be the operators who are the least ethical, least trustworthy and least reliable.

This means everyone in the search and SEO spaces suffer from unreliable, negative operators being the norm in the industry. It creates a terrible perception among marketing managers, that SEO is just a black hat wasteland. How can you build a great business, how can you build a great reputation, how can you build a great career in this field, if that's what is going to work?

If You Have Been Hit by Panda What Do You Do?

Eric Enge: With all these things in mind, what does someone who has been hit by Panda do?

Rand Fishkin: I think one of the best things you can do is determine if there are pages on your site that are performing very well versus ones that aren't and look at the difference between them. Almost always, you will find a significant difference.

If you have been hit, and your pages have been hit across the board, and you don't have anything that's ranking on your site look at what is now ranking in your space. You should especially look at the content that hasn't earned good links but is ranking. Obviously, nothing that's been manipulated with black, grey hat tactics, or any pages that have been well-linked to or socially promoted.

One thing you should look at is the long tail content in your industry that is performing well purely on the basis of its good content. Look at the formatting, look at how they use advertising, look at the content blogs, look at the layout and the UI and the UX, look at what the content is, look at the experience it creates for people, look at how it was generated. All of those things can give you the signals you need to do the right thing to get out of the Panda box.

Eric Enge: So the great insight you are pointing out to find something that ranks even though it isn't well-linked. That's a great approach because the lack of the inbound links to the content says that all the other signals are doing the right things.

Rand Fishkin: Yes.

Eric Enge: Of course, if you have bad user engagement historically, you may have to wait some time for those signals to be seen by Google and acted upon.

Rand Fishkin: This is one of the things that is frustrating for a lot of people. That this is going to take time to recover from.

Eric Enge: You may be waiting six months before you come out from under this problem. I just made up the six months, but you shouldn't be thinking six weeks.

Rand Fishkin: Agreed. I know people whose board is breathing down their neck, but you need to tell them essentially we were earning traffic we didn't quite deserve, or we were doing things in a low user experience quality way and we are going to have to take that up a bunch of notches, and over the next few months we can hope to recover.

The first thing I would do is start publishing things in a different subfolder. I would start publishing them in a different format, and I would start seeing whether I could get those pages indexed and ranking well; and if not, it might mean that there is something going on domain-wide. You then need to do what you spoke of earlier, which is consider removing many of your pages from the search indices so that your site can regain the domain authority and the ranking it deserves.

Eric Enge: For many people it may be easier to start over.

Rand Fishkin: Yes.

Eric Enge: Thanks Rand!

Have comments or want to discuss? You can comment on the Rand Fishkin interview here.

Other Recent Interviews

About the Author

Eric Enge is the President of Stone Temple Consulting. Eric is also a founder in Moving Traffic Incorporated, the publisher of Custom Search Guide, a directory of Google Custom Search Engines, and City Town Info, a site that provides information on 20,000 US Cities and Towns.

Stone Temple Consulting (STC) offers search engine optimization and search engine marketing services, and its web site can be found at: http://www.stonetemple.com.

For more information on Web Marketing Services, contact us at:

Stone Temple Consulting
(508) 485-7751 (phone)
(603) 676-0378 (fax)
info@stonetemple.com

martes, 17 de mayo de 2011

Stone Temple Consulting (STC) Articles and Interviews on SEO Topics

Stefan Weitz Interviewed by Eric Enge

Posted: 16 May 2011 10:14 AM PDT

A Web of Verbs (Not Nouns!) Tweet This Article

Published: May 13, 2011

Stefan Weitz is a Director of Search at Microsoft and is charged with working with people and organizations across the industry to promote and improve Search technologies. While focused on Microsoft's product line, he works across the industry to understand searcher behavior and in his role as an evangelist for Search, gathers and distills feedback to drive product improvements. Prior to Search, Stefan led the strategy to develop the next generation MSN portal platform and developed Microsoft's muni WiFi strategy, leading the charge to blanket free WiFi access across metropolitan cities. A 13-year Microsoft veteran, he has worked in various groups including Windows Server, Security, and IT. Stefan is a huge gadget 'junkie' and can often be found in electronics shops across the world looking for the elusive perfect piece of tech. You can follow Stefan on Twitter.

Interview Transcript Eric Enge: Let's talk about the near and longer term frontiers of search. One recent notable event was the Google Panda algorithm change, which caused big waves. I classify changes like Panda as an overt attempt to measure content quality or user engagement. Would you give us your thoughts on that?

Rocking Horse Panda

Stefan Weitz: Google's Panda Update was an interesting event. I saw reports recently on DemandMedia showing they were down 40% on their traffic. What this speaks to is the necessity to look at page level quality. I think one of the things that started the work on Panda was the JC Penney paid link issue which called into question the quality of PageRank.

Google initially responded by blocking the entire JC Penney domain for a few days. We thought that hurt the users because we did the same thing in a test. We blocked all JC Penney internally and asked our human ranking systems "does this result for the search phrase "comforters" look better or worse after this change?" Everyone said it looked worse because they expected to see JC Penney there.

Page Level Classifiers

Stefan Weitz: What it told us was there are different ways to classify quality of pages. We have page level classifiers that look at every page we index that attempt to discern a quality score. It looks at things like reading levels, number of ads versus content, length of words, length of page, all those standard things, and some not so standard things as well.

It looks beyond the links coming into the page and beyond the easy things, like counting words on a page, and uses semantic technology to figure out what it is that page is discussing and if it is high quality. So, yes I think you see some similarity between what we are doing and what Google is doing.

Eric Enge: You mentioned that the ratio of ads to content on a page would be one metric. Another one might be how the ads appear, how prominent they are on the page. But doesn't that entail a lot of sophisticated CSS analysis to determine what objects are on a page?

Stefan Weitz: Yes, this is not an easy thing do at the page level. It might be easy for someone to say that for a domain like eHow, we should apply a blanket kind of deprecation of their rank. That is certainly a way some folks have wanted to go about this. Blekko does this today with their engine; however, we think it is better if we do the costly way of looking at page level analysis.

To your point, you can't always tell at the computer level and algorithm level what the page will look like at the end, especially if the page requires some input or is a dynamic page. Generally, we get a good idea by using domain level classifiers to say this page looks suspect and if we know the domain itself tends to be 84% suspect, those two factors alone give us hints about the quality of that particular page.

Lexical Analysis

Eric Enge: Is there any semantic or lexical level analysis you can do to get signals of that kind?

Stefan Weitz: Yes, absolutely. Lexical signals help us determine the reading level score, which we have used for years; for example, how advanced is the language, how complex are the phrasings and the noun and adjective agreements. Also, if we see a page title of Canon camera, but the body talks about vacations in Hawaii, then we know there is a mismatch which gives us hints as well.

Eric Enge: If someone uses advanced language and it looks like a sophisticated document, that would suggest that it is not a good match for some searchers and to others that it is, right? Is this a way of classifying and matching up the sophistication level of the document with the sophistication level of what the searcher is looking for?

Divergence from Google

Stefan Weitz: I don't believe we are quite there but you are highlighting one area I am beginning to see us and Google almost diverge. They are very focused on link level analysis and understanding what pages to return to a query which is important for the kind of web we have talked about for many years. However, we are beginning to look at the web differently than they are.

We look at the web as a digital representation of the physical world. The web is becoming this rich canvas that represents all things you and I touch and interact with every day. When you start moving to that level of thinking, the notion of links to keywords is important but it doesn't serve us as well if you move into, what I like to call, a web of verbs versus the web of nouns that we have been living with for so many years. I think that's when search starts getting to things like social, services and geospatial which all become more prevalent as we begin to think of this web as a high definition proxy for the physical world.

Eric Enge: Let us expand on this notion of verbs. Give me a practical example or two of how that plays out differently as opposed to a noun based analysis.

A Web of Verbs

HTML Page

Stefan Weitz: Let's step back and examine how the web was. First, let us stipulate that the structure of search is really predicated on the structure of the web itself. What I mean is, if you think about Berners-Lee and the work they did, they codified HTML and HTTP as the underlying structure of the web as we know it and that yielded a bunch of pages, a bunch of text pages and a bunch of links. Those links, in many cases, were anchor text that pointed to a particular page.

That allowed engines to say, even though I see this URL is iflyswa.com, which was Southwest's old URL , the fact is that I have eight million pages pointing to this URL with an anchor text of Southwest Airlines. That allowed the search engines to make that association so when someone typed in "Southwest Airlines", they could conclude that the searcher was looking for the domain iflyswa.com. What that did was make searchers think about search as a tool to find something else, a noun based search.

If we thought "I have to check in to a flight on Southwest" we rarely, if ever, would type "check in Southwest Airlines flight 858" into a search engine because we knew it would fail. Even though our intents were action based, or verb based, we defaulted back to this navigational model, or noun based model, because we assumed the search would fail if we attempted to do something more advanced.

Eric Enge: Or if we tried something more advanced, it failed, so we backed off.

Stefan Weitz: Precisely, and most people assumed search is good for 2.2 keyword searches and that's all.

Candlelight Dinner

That's where we were, that's the web of nouns. We had this great web of nouns and the connective tissue among all the pages was links that defined this noun based web. Now we are getting more into this action or verb based web. If I type in "book a romantic table in Austin, Texas next week for two at 7 p.m.", the search engine can now understand that query at a semantic level, understand the nuance of what I am asking, and then because there are enough services opening up their protocols and their APIs, Bing can then broker out that request to a number of different services across the web and stitch that information back together to help me go from I want to do this to I have done it. That's the web of verbs which is this whole separate web, if you will, that's been evolving over the past couple of years.

Eric Enge: Another simple example would be "buy a digital camera" which is fairly straightforward and a noun based web handles that reasonably well, but there are many other queries such as "book a romantic table", or "learn about diabetes", which won't work well if you want to learn enough to write a paper.

Stefan Weitz: Correct, because the web, up to this point, has been about navigation, finding something to then do something with. The search of tomorrow is more about actions and decisions, not just about finding. You can do much more on the web than you could a decade ago when search was pioneered.

Social Signals

Eric Enge: What is the timeframe in which you see this unfolding?

Stefan Weitz: Well, a number of factors come to play. The first thing we see helping this trend along is the social infusion into search. Traditionally, if you were to make a complex decision using search, you would stumble around, look at a bunch of links, hope you find some information, and then probably end up giving up and asking a friend or calling a buddy. Humans have this primal behavior around the social experience where we almost always ask our friends and acquaintances for advice.

Part of what this new verb based web is turning into is the ability for us to connect up queries with people who could help answer those queries more effectively. We are doing this in Bing with Facebook and Twitter, so when I do a search for "parasailing Maui" in Bing, if any of my friends anywhere have liked any link across the entire world wide web, I am going to inject that link into my results page. That's a fairly primitive example.

From a Computer Science perspective it is actually a phenomenal example, but from a UX standpoint, you and I look at that and say it's just okay. But think of the power there, what we have done is removed all that time you spent looking through bad links, or trying to figure out which one of your friends have been to Hawaii in the last year, and we literally take the digital traces people leave across the web and infuse those directly into the search experience. That's a pretty profound change in how search works.

Wisdom of the Crowd

Eric Enge: That works as long as your social network has the answer somewhere in it. For example, if you don't have a friend who has been parasailing in Maui, they may not be able to help you answer that question.

Stefan Weitz: Right, so for those examples you get into things such as the aggregate wisdom of the crowd to help answer those questions. You start looking at Twitter updates to see if anyone has Tweeted about the experience of parasailing in Maui. We are trying to take that which you do today and get constant affirmation or decision help from all your acquaintances and inject that real time into your search. However, you are right, if no one you know has information on your particular search, then you are relying more on the wisdom of the crowd to help find that information.

Eric Enge: If I were to try to present a picture at a high level, what you see happening is the social interactions, whether it be with your immediate friends or the wisdom of the crowd, is going to be one of the dominant forces on the web. The task then is to leverage those networks as an information source.

Stefan Weitz: That is correct. Also, you can think of it as the first implementation. As people leave more traces of their physical self in the digital realm, the ability for us to process those traces and do interesting things with them escalates.

Privacy

Eric Enge: The next logical question is will there be some privacy kick back which could be a threat to the accessibility of those digital traces?

Stefan Weitz: For us it comes down to three big principles: make sure it is transparent, make sure it is controllable, and make sure there is a value exchange. If we are using something to personalize your experience, to make your experience better, you should know that, and here is what we know about your search history and you can go see it easily on Bing. The control aspect is how do we make sure you can shut it off and how do you delete things, whatever those things might be.

The last thing, which is most important to me, is what are you getting in exchange for that information? That is the same argument used for a long time with Amazon. The fact that I buy 80% of my stuff on Amazon means they have a lot of information about me. I am okay with that because the value I get out of it is pretty high as I get recommendations and discounts.

If it was a third party aggregator who took that information and offered me zero value, I would be more reticent to let them use my information. So, these are the kind of principles we have. I like them, I think they are good and that is how we design all products.

Leveraging the Search History of Others

Eric Enge: Do you envision a situation where you will know that Joe has just searched on digital cameras on Facebook, no one in his direct social networks has actually Tweeted or done a Facebook update related to that, but you know his friend Suzie did a similar search two weeks ago. Do you picture that level of data being available?

Stefan Weitz: Certainly. We will look at second order networks, and since you know somebody who knows that person, present that information on top. What you are asking about is something I am a big fan of, expertise based matching. Forget whether or not you know Suzie. That might be less important than the fact that we know she is an expert in parasailing.

Because of all the traces she left we can devise that parasailing is one of her key interests. Also, she is quoted a lot and re-Tweeted a lot which makes her an authority on that topic. You can begin to understand the power of leveraging those social traces. All the things she did across the web could influence the result page directly because we know that she is influential on that topic.

Eric Enge: So this makes her a very interesting computer science problem.

Stefan Weitz: Isn't it cool? You would appreciate this as a person behind some of the earliest computer science at Phoenix Technologies (editor's note: Eric worked at Phoenix Technologies earlier in hi career). The amount of computation that must happen for every search you do, you have to look at every single friend and their entire legacy and decide that one of the hundreds or thousands or millions of URLs that your friends have liked is a good match for the query terms you put into Bing, and then do that within a few milliseconds. It is one of those computer science problems that are amazing to solve at that scale.

The Bing Landscape Today

Eric Enge: This will likely happen in stages for exactly that reason. Would you talk about that part of the landscape today in terms what you are doing?

Stefan Weitz: Since October, we have had the Twitter augmentation. That was interesting because it attempted to provide a layer on top of the Twitter fire hose. We tried to find the most reputable people who are Tweeting, we tried to de-dupe things, and take out spam and adult links and make it so even my mom, who doesn't have any idea how one can use Twitter, can go and type in "hair gel" and get back Tweets that make sense.

Then of course, we have the Facebook deal, inked and done. Since last October, it has been about how we leverage those billions and billions of data points that we see from Facebook into the results. We have recently launched Like annotation which means if there is any URL on any search page that any of your friends have ever Liked, we will show that they liked it.

We will actually carve out a big space on a page and say Paul Liked this link. Also we are getting into the people search arena where we find people you may know. For example, if you are at a party or a business event and you met someone but can't remember their last name or you don't know how to contact them, the people search, which itself is 4% of the queries we see on the web, is a helpful tool for social.

Over the next several months you will begin to see more use of Facebook data, a better people search, and you will see us leverage more than just the Likes. Information about the person will be taken into account in different ways across the Bing search engine that extends beyond what they liked.

Eric Enge: For example, today there is a volume of Likes that a page has and, particularly, if it is Liked by your friends, that carries weight in terms of returning the results for a particular user.

Stefan Weitz: Yes, if you have friends that Liked particular results, that will tend to show up higher on page 1. As an example, I never watch sports and I recently did a March Madness query. One of my friends, Kelby Johnson, Liked an add-in to Outlook which added all the March Madness games to an Outlook calendar.

That's interesting because the link that Kelby shared would have not normally appeared in the standard ten links for the query March Madness, but since he is a friend of mine and because he shared it on Facebook, it appeared on position 4 on the page.

I know Kelby is a sports fan. I trust what Kelby says about March Madness is going to be of high quality. It is much more like a one-to-one type scenario.

Wisdom of the Crowd

Eric Enge: I get the one-to-one theme, but if you don't have a friend that happens to be an expert, you can always fall back on the volume of Likes that represent the wisdom of the crowd, right?

Stefan Weitz: Yes you could (although we don't do that now), and we use the volume on Twitter, because they help us understand what should rank well for news. For example, we are seeing many Tweets on Libya. That can help us understand that we should look at the news and make sure we index what is going on. There is magic that happens as we identify trending topics from the social networks.

Eric Enge: You mentioned Twitter and news being an obvious application to Twitter because of the recency effect, an example of which is finding out about earthquakes on Twitter long before you would find out about them anywhere else. I will call that a vertical application. Are there other types of vertical applications which Twitter and/or Facebook are particularly well suited to?

Stefan Weitz: That's a good question. You mentioned the news and we started embedding Twitter Tweets into the news page itself for that very reason. If you start searching for North Korea, or some news worthy event, you will see it on the right hand side, the actual real-time updates on North Korea, so that's obviously a big one.

Facebook check-ins

The other things we are seeing are check-ins on Facebook. When you see many folks checking into a particular location and see whether there is an event or something else going on at that location. That helps us understand, for the local angle, what restaurant is popular now. That is interesting information to look at.

Eric Enge: If you are looking for a place to eat while in Seattle, and you type in "Italian restaurants Seattle", or something like that, you can return in the results which of your friends are currently at Baluchis and you might want to go there because Joe and Susie are there now.

Stefan Weitz: Yes, or say in another analogy, they liked this in the past or they checked in here in the past.

Another example, maybe its Friday night, you are young and you want to go out clubbing so you could leverage that Facebook check in power to be able to see what looks like the hottest place now based on check-ins.

Eric Enge: More check-ins is an indicator that it is hot.

Stefan Weitz: Precisely, but what you are identifying is there is all this data and the challenge is to write filters to parse or piece these things out and say what is the use case, what is the scenario where that piece of information or that set of information can help make a decision. The nano footprints and digital traces people are leaving are incredible, and we have never seen this level of data in all of human history in a way that a machine can read it. It is truly one of those things that I am literally awed by.

Eric Enge: One of the interesting challenges though, only about 7% of the US population is on Twitter. That means there are many people who aren't in that environment.

Stefan Weitz: I think you see this in the example of my mom and Twitter. She has no interest in Twitter and doesn't know what she would say on Twitter, but she can benefit from the folks who are on Twitter. The second interesting thing is something I spoke about with David Kirkpatrick, who published the Facebook Effect book, a few weeks ago. He mentioned the fact that he doesn't think people know how much they are broadcasting online.

Computational Challenges!

From a computer science standpoint, you now have this ability to do amazing things with all this data and, in a very positive way, you can help people make better decisions and do things faster because we are able to predict or help with these things in a computationally efficient way.

I was supposed to be at Cray Computing today which is based in Seattle. They are having a symposium on parallel processing and very large graph data sets. When you get into this level of graph data, which is often reminiscent of semantic triples, traditional commodity computing starts to become less and less efficient. We need other ways of handling those graphy data sets.

Cray and Bing has some cool stuff around vector scaling but I think the computational model is going to move away from purely commoditized and more towards a system which can begin to make sense of data. This is similar to the way we built Farecast which analyzes a billion and a half price points a day for airfares and attempts to synthesize that data into something humans can use. Those are very specialized filters on the web. We see more and more of those specialized filters being developed to preprocess the web for people in a way that helps them get things done. It is exciting.

The impact of apps

Eric Enge: A couple of years ago, we might have referred to some of these things as vertical search. Expanding into a services conversation, you have a growing family of these services which are evolving, to use your phraseology, to verb oriented. I want to do this, I want to do that. Can you expand on that a little bit?

Stefan Weitz: I think I go back and forth on this. A while ago I wrote something called App-ocalypse because I was lamenting the fact that we had half a million apps across the two big global platforms. I commented on how chaotic that was, the fact I needed to install an application on my device to figure out how to get from the Lower East side to 57th in Manhattan.

I had to understand what app to install, decide if I wanted to make the purchase, install it, and then wait for it to load and install while waiting in the cold. When all it was doing was brokering a request out to a particular data source.

Eric Enge: You also had to decide which version of the app would work on your platform.

Stefan Weitz: It has gone better with the marketplaces, with iPhone and Android, Windows Phone 7, but it is still chaotic to me. There are many applications which are simply front ends to data sources. The problem is the average consumer doesn't want to spend time figuring out what they should install so they are losing out on the richness of the web. The positive is that many of these application developers or sites are developing these apps so they can be accessed via programmatic means, like an API.

In the early days we had the UDDI notion, Universal Description, Discovery, and Integration, the concept was that we would have this published schema on the web of all the web services that a developer could call. We are getting back to that. If I want to book a table, as an example, I know there are a few services I can call up, such as Urbanspoon or DinnerBroker. I can call their APIs on the web and actually broker that information in to my search results.

Urbanspoon logo

If you type "Gennaro's in Boston" in Bing, we know it is a restaurant query, we know it is in Boston, we will say the likely intents for that query are to find reviews, to find photos and to book a table. Those are three likely intents we see for restaurant queries in Bing. Bing will check, do we have OpenTable? Yes we do, so we can broker that experience right into the Bing interface while you complete the action from Bing. You are going to see more of the application ingestion or the application exposure inside the search engine.

Eric Enge: You are virtualizing that whole experience and they don't have to figure out how to install the app, and configure it, and remove the platform issues. Those issues are getting better but will never be totally solved because every time a platform becomes standard it creates an incentive to differentiate.

Stefan Weitz: Exactly. OpenTable would say we also like folks coming to our site because it increases loyalty and they get points and that makes sense. But in many cases, there are opportunities for search engines to say I am not going to simply index pages any more. I am going to index services as well and I am going to index the real world, the geospatial world as well to build out this comprehensive model of the world in which we live. Then, once we understand the intent of the searcher, we can connect up those resources in an intelligent user experience. That's the holy grail.

For the last twenty years or so, we have been hearing about semantics, and services, and web services, and mapping. Remember Mobile coupons was a big thing a decade ago? You are going to walk by StarBucks and boom you are going to get a coupon for StarBucks on your phone. We have been hearing this every year for the last ten years, that this is the year we are couponing, well suddenly look at what has happened.

In the last eighteen months, you literally have things like Facebook Open Graph which defines a pretty loose, but still usable, semantic model for objects across the web. Using Facebook Open Graph, you literally can tag your page as a movie page, so you know that this is a movie. That's a big deal from a semantics' standpoint. You have Groupon or LivingSocial for mobile couponing. You have Facebook and LinkedIn and Twitter all helping us understand your social relationships or your comments on things, and blasting them out to the web in a way a machine can read them.

All, or most, of the things we have needed in the last two decades have begun to become tangible and real in the last eighteen months. I can finally see all the pieces coalescing and coming together. In the next twenty-four months, you are going to see from us an acceleration of this, the likes of which I don't think we have seen in the search space. I will give you an example. We did some work with movies with MS research called Project Leibniz, which is named after the famed physicist and mathmetician.

Digitizing the Real World

Stefan Weitz: Project Leibniz began by trying to understand at how we could look at a movie as a physical object in the real world and not simply a page on the web. We know that some page about Casablanca could refer to the actual movie, and movies have characteristics, they have attributes, they are an object. We begin to ascribe all these traces back to that "Casablanca" object, all the characteristics that make it up in the real world. Here is the packaging, the graphics about it, who directed it, when it was released, everything.

If you go to Bing and search for a movie, what you get back is not simply a page that points you to all the pages on Casablanca, we show you a page informing you where you can watch it and eighteen thousand reviews from across the last hundred years of cinema. This is possible because we now understand that movie as an object. That is going to be applied across more and more physical things over the next eighteen months or so. And the fact that we know the connections you have and what they are doing, you are going to see a huge amount of work there as well.

Eric Enge: We haven't really touched much on geospatial so can we dig into that a little bit?

Stefan Weitz: Sure. I use geospatial, and I probably should do a better job characterizing it. It is about digitizing all the objects in the real world. The Casablanca movie is good example. As you get into more of the core physical world of reconstruction, you see things like the new Panorama app for iPhone. I think it is not appreciated for all of its value. You have this free app which anyone with an iPhone 4 can install.

A person can walk into their business, turn on the button and create a three dimensional view of their entire business in about two minutes on their device. They can hit a single button and upload the panorama including their lat and long and suddenly create something which would've taken thousands of dollars and weeks of processing to do. The ability to crowd source the world and get this data from a number of devices is something we have never seen at this scale.

Companies like INRIX are embedding sensors inside of GPSs that will pull back traffic data on side streets. You now almost have this problem of too much data coming in. I think it is interesting for query resolution, but also for living your life on a day-to-day basis.

As far as your question, "what can be done to improve the data in local business search?" we recently launched the Bing business portal which is a major upgrade to what we had. It allows you to upload photos, all those things are now built into the system. One of the most interesting things that local businesses can do, which isn't often talked about much, is make sure your social presence is as important as your standard digital presence. If you are a hotel you can have people on Trip Advisor, or Yelp if you are a restaurant, or if you are a plumber on Facebook or Twitter, the social signals that people will generate about your business will continue to amplify your importance online.

Businesses that are 1-person or 2-person shops don't have time to think about a dashboard to monitor their reputation. SEO firms must continue to help these clients get better traction and better visibility in the online space. A comprehensive review of their social standing and how they are engaging that community is huge.

Eric Enge: That means the opportunity, of course, for savvy businesses is to provide lots of signals and make more things available which will likely generate a positive buzz online.

Stefan Weitz: Exactly. Imagine, especially as we begin to think about the world as this digital canvas, when we begin to implement things Project Leibniz to physical locations. If I am looking up the California Pizza Kitchen across the street from my house, traditionally we would look at that as a Lat and Long, we would look at it as a White Pages listing, or we may look at it as a webpage on its corporate site.

Now, because we think of it as an object, we understand that this particular CPK does have OpenTable or Urban Spoon reservations available, it has 48 seats in it, it has fifty Tweets in the last two days about how awesome the barbecue pizza was, it has n number of Facebook checkins, and eight of your friends like it. All those pieces of data are being accrued back to that physical location. As a business owner you need to make sure you have as many assets as you can that describe your location and allow people to push information back. This is huge from a search perspective.

The Future of Search

Eric Enge: As your final comments, would you offer some thoughts on the topic of where Bing is going with all this.

Stefan Weitz: It all comes back to the childhood vision many of us had which was the invisible and intelligent agents that could help us in many ways. I think back to the Apple Knowledge Navigator video I saw in 1986. A gentleman came into his office and interacted with a fully autonomous agent on a desk and asked that agent to do different things such as find the presentation he did last year, see if Kathy has availability next week, and check when he could make it to this conference.

The agent combined all the sources of information it had, everything it knew about that person and that person's history. All the agent was doing was searching. Granted it was searching calendars, and history of documents, and where this particular location was, but those were all searches.

If you speak to someone like Jan Pederson, chief scientist at Bing, he will say the search box is the true universal interface. You should be able to ask it anything and we should be able to disambiguate what it is you are asking and marry up that with the right resources to resolve what it is you are trying to get done. That's where I see it all going.

Eric Enge: You can learn when Joe asks a question he tends to use phrases one way, whereas Susie will use entirely different phrases. These may be based on her life experiences, where she lives, or what languages she uses. Disambiguating all that data and saying what you really want is this.

Stefan Weitz: Exactly. There is a user specific model and you also think about things you are not necessarily asking anymore. I was in Sydney last week and had to get from Sydney to Canberra. That's all well-documented in my calendar, where I have to be and when. The fact that I had no car reservation the agent could've alerted my agent to say Hey Weitz you are in Sydney on Thursday, you are going to be in Canberra on Friday, you have no way to get there, what's going on?

The ability for the agent to process things that it knows about you and your current situation and actually proactively push things to you is another thing we see happening. It is not simply about the user asking, it is about being told.

Eric Enge: Or reminded as the case may be.

Stefan Weitz: Yes.

Eric Enge: It sounds as if it is an exciting time. I have maintained for a while that the rate of disruption in our world and industry has been accelerating for decades and the acceleration is only going to continue.

Stefan Weitz: Yes, I agree. People make mistakes about the linear acceleration model. Raymond Kurzweil and Michio Kaku talked about how people tend to underestimate how fast things will happen because they assume a linear progression of technology, and in our business it is a logarithmic progression, it is not linear.

Eric Enge: Right, but it doesn't always happen when you expect. For example, for five years we kept hearing that this will be the year of mobile, and it didn't unfold until we had the iPhone to help it along.

Stefan Weitz: Yes.

Eric Enge: Other things happen in the meantime and change happens in unexpected directions and that is part of the fun.

Stefan Weitz: I know. It is such a blast.

Eric Enge: Thanks Stefan!

Have comments or want to discuss? You can comment on the Stefan Weitz interview here.

Other Recent Interviews

STC

martes, 24 de mayo de 2011

Stone Temple Consulting (STC) Articles and Interviews on SEO Topics

Stone Temple Consulting (STC) Articles and Interviews on SEO Topics

martes, 17 de mayo de 2011

Stone Temple Consulting (STC) Articles and Interviews on SEO Topics

Stone Temple Consulting (STC) Articles and Interviews on SEO Topics

Seguidores

Archivo del blog

Datos personales