Michael King presents The Technical SEO Renaissance at SEMpdx SearchFest, Portland, Oregon – March 10, 2016
All right. Thank you, Rob. Now, we’re up to the keynote speaker. First his bio though. Let me tell you a little about the closing keynote speaker, Michael King. Mike is a bit of a renaissance man, so his keynote title, the Technical SEO Renaissance, seems fitting. An artist and a technologist all rolled into one, Michael King recently founded the boutique digital marketing agency iPullRank. Mike consults with companies all over the world, including brands ranging from SAP, American Express, HSBC, SanDisk, General Mills and FTD, to a laundry list of promising startups and small businesses.
Mike has held previous roles as a marketing director, developer and tactical SEO at multinational agencies such as Publicis Modem and Razorfish. Effortlessly leaning on his background as an independent hip hop musician, Mike King is a dynamic speaker who is called upon to contribute to conferences and blogs all over the world. I think Rand Fishkin got it right when he said, “Mike is one of the marketing industry’s finest. He’s both deeply technical and highly creative, a very rare combination.” Please join me in welcoming Michael King.
Michael: We on? Make some noise. I love doing that. So I’m really excited to be here for a lot of reasons. I’m really excited to talk to you guys about technical SEO, but I’m also really excited because I’m having a baby. And so this kind of feels like…I feel like Kobe Bryant. This is like my farewell tour, you know? So what I wanna talk to you guys about today is, you know, where we’re at in SEO as far as the technical side of things. And I encourage you to grab the URL. I also tweeted it out. But let’s start with an abridged history of SEO, at least from my perspective.
So at first, there were webmasters like me. When I was 14, I worked at Microsoft as a high school intern, and I was a webmaster for the ActiveX team. And you did everything there was to do. Like there was no, like, front-end dev, back-end dev, and all these, like, separations of concerns. It was just webmaster. Then came the search engines. And back when there was just webmaster, you know, we didn’t know how to find things. And this just tell you guys how old I am. We didn’t know how to find things. And so, you know, people had resource pages where they linked to things or you knew people that emailed you links and things like that. But there was no search engines, right?
And then, the search engines popped up. You had Alta Vista, Lycos and all those other things. And then SEO came out of that webmaster information retrieval hybrid. So it was all these nerds who were just trying to get money, right? And they figured out, “Oh, maybe I can use this search engine thing to make some money.” And they did. And for a long time, it was just about hacks and tricks, like “Oh, if you bold the keyword 40 times, you’re gonna rank.” It was true. Like literally my first SEO job, that’s what I did – I bolded keywords.
The Google Showed Up
And then Google showed up and everything changed. I mean, we all know what Google did. Like they made it all about the link graph rather than just the keywords on the page. But then, it was still about hacks, tips, tricks and testing things, right? We had this culture of like, “Oh, I figured this thing out. Why don’t you try it?” And we were like magicians, you know? We were like Steph Curry. Like, how did you even do that, right?
So then Google got better, and this was a bad day for a lot of people. At that point, I didn’t really care about SEO. I was still like, touring. But I would get a job just long enough to make money until my boss pissed me off and I would go back on tour. So it wasn’t that big of a deal to me. I was like, “Oh, Florida. That’s cool.” And then, I got more into it and Google got even better. So then Panda came out. And I remember I was working on LG at this point and they had a site that was like, you know, it wasn’t, like separated by CCTLD’s or even subdomains for the different countries. So all the shady shit people were doing in Japan was affecting us. And I’m like, “How is LG getting penalized? This is so crazy.” And then, they got even better and Penguin came out. And that’s when a lot of people just kind of threw in the towel. They’re like, “I’m not SEO anymore. I’m a content marketer.” I mean, y’all were there, right?
And so, the other thing that happened is a lot of people came into SEO from other disciplines, right? Like, a lot of people in this room that if you do SEO, there’s no career track that got you here. You just kind of ended up in it, right? Unless you’re like Jon Cooper, who wanted to be an SEO when he grew up. I love John. He’s awesome. But anyway, then it all became about content. And I’m not going to sit here and act like I wasn’t a big proponent of that. Like, I remember 2012 on MozCon, I was like, “We need to be doing content, guys. It’s so important.” Right? But we’ve kind of started to act like, oh, you know, SEO isn’t technical anymore. You just make more content. You’ll be good, right? This is mostly marketers marketing to marketers about marketing.
The web is more technical than ever before
So finally, we actually got on board with what Google’s been saying all this time. Like, “Oh, just make great content and you’ll be fine.” But really, the web is more technical than it’s ever been before. In fact, there are more technologies in use on different websites than are in a lot of your Microsoft applications, which doesn’t really mean anything. But, we’re basically being dragged into this SEO renaissance, this technical SEO renaissance because of that. And luckily, I have a company that can work with you if you need help. Shameless plug.
So one of the key points that I wanna make is that View Source is dead. You know, it might be obvious to you, or maybe not. But the bottom line is you can’t just look at the raw source code anymore and expect to understand the page. What you need to do is look at Inspect Element. So this is the actual code from seamless.com, which uses AngularJS. And as you can see in this original screenshot, the rel-canonical has a variable in it. It doesn’t have the actual URL when you look at View Source. But once you go to Inspect Element, that’s filled in. So the important part about this is that this is how you can actually see the fully rendered page and actually do some proper understandings, and audits, and things of that nature.
Now, one of the more important things that’s gonna be coming, or actually is here already is HTTP/2. And the main thing I want you to take away from that is it’s Google trying to make us even faster. One of the trends that you’ll see across this presentation is that Google is doing more and more things to make the web as fast as possible. HTTP/2, from a technical perspective, one of the key things that it does is it uses one HTTP connection. Right now, when a web page is downloaded, it’s going back and forth across numerous HTTP connections, which causes load times to be very slow. The thing is that it hasn’t really been adopted yet, but it will. This is where the web is going.
We need more from our SEO Tools
So the next point I wanna bring up is that our SEO tools are very much behind where we need them to be. And Justin and I were talking about this. You know, we were kind of thinking that maybe what happened was when we all got on the inbound marketing bandwagon for a second, we kind of just forgot about SEO. And a lot of our tools were like, “Oh, how do we get more of these content marketing budgets?” Or, “How do we get more of these inbound budgets?” And there’s a lot of features that are just not accounted for right now.
So what do rankings mean in 2016? I mean, where’s number one? And Dr. Pete talked about this earlier. So truthfully, in this SERP for car insurance, number one is actually number eight. And then when you think about rankings for smartphones, well, Cindy talked about this earlier. You’re seeing a lot of different things based on the combinations of the user’s configuration. Their device, versus their browser, versus their OS is showing you different features in the SERP for different users.
When you think about rankings, are you just thinking about vanity?
So when we’re thinking about rankings, are we thinking about context and actionability or are we thinking about vanity? Of course, you’re gonna show our clients, you rank number one in organic, but if that is really number eight, what does that mean? So, what I would love to see in ranking tools is tell me about position zero, like Dr. Pete was saying earlier. That, “Knowledge box, am I there?” Well, here’s the CSS selector that you can use to grab that for me. Like I said, number one is really number eight.
Okay, what about cloaking? Cloaking was a big deal in, like, 2006 and 2003 and all that, right? But what does that mean in 2016? So 304 response code, what this means, if you don’t know, it means not modified. And your browser, when it goes to a page, and if it says, “Hey I pulled this page on a certain date, has it changed since then?” And it says no, it’s gonna give you the 304. So that means it doesn’t download the page again. And so basically, it’s gonna show you the version that you already have.
Well, Google actually adheres to this. So if you were to say, you know, Google indexes a page, and then you give them a 304 response code, well, they’re not gonna download that page again. So that’s an opportunity for you to serve something up the first time that’s very different than what you want moving forward. And effectively, you’ve done cloaking. But this is allowed. What’s cloaking in the responsive and adaptive era? Like, there’s nothing that says I can’t have big blocks of text on one version. That may be the one that Google indexes, but the user sees something different because of their device. What does cloaking mean now? I don’t know.
And then the other thing is that, with our SEO tools, Google actually allows us to specify directives on HTTP headers. So, for example, X-Robots, the same way you can use your meta robots tag, or your robots.txt, you can say, “Hey, block this page from the search engine.” But Screaming Frog does not crawl for this. I mean, some of the better tools do, but I know Screaming…well, I’m not saying Screaming Frog is a bad tool. I love Screaming Frog. Some other tools do that, but if you’re only using Screaming Frog, you may not know why your page is not indexed. Same thing with hreflang. So when you set up hreflang, you end up with like a block of code like this at the top of your page if you have a ton of pages that you wanna tie it to, which impacts site speed as well. A better way to do it is using the HTTP header. But again, our tools don’t account for this. They don’t know to look for this. And then, finally, same thing with rel-canonical.
So the other thing about our tools, with mobile, part of the specification allows us to use client site redirects to connect the mobile version to the desktop version. But our tools don’t crawl that way, so there’s not really a good way for us to add scale to know whether or not we set that up. Our SEO tools really need to step their game up. Chrome DevTools, the thing that comes for free in your browser actually does a lot of this stuff, not the ranking stuff of course. Use that for rankings. They’re the shit.
Another thing that you should know about with DevTools, because DevTools has a lot of things in it that natively, you know, the tools that you’re paying a lot of money for don’t do well, remote debugging is one of those things. So what you can do is actually plug your cell phone or whatever device into your computer, and then you can actually control it from your browser. So the cool thing about that is you can do actual analysis based on the actual speed of your device. And you can also run a lot of your extensions and things like that so you can get a proper analysis of those mobile contexts.
Google is crawling headless
Crawling. I love crawling stuff. This is fun. So Google, we long thought Google just crawled the web with text-based crawlers. And what that means is they download the code and then they analyze it, you know, as though it’s just copy that it’s looking through. It doesn’t properly execute the document object model like you’re seeing in Inspect Element. But Google’s actually stepped up its game pretty dramatically in the last few years. And they crawl with what’s called a headless browser. And that’s basically an invisible browser they can use to look at the page the same way that a user does.
And so Adam Audette did some testing around this. So you know it’s true. It actually works. You know, they’ve done a lot of tests around it. They show that Google can index some of these things that otherwise you wouldn’t think they can see. And an ex-Googler actually confirmed this as well. He said between 2006 and 2010, that’s all he worked on. So that means they’ve been able to do this for a long time. They’ve just gotten a lot better at determining what is on those pages as well. I actually said it in 2011, just saying.
But why does it matter, right? Why am I harping on this idea? Because that means a lot of things that you thought were hidden from search engines is actually not. Like they’re able to crawl the page and view it the same way as the user, but what you have to think about is how much of that activity are they actually gonna do? Like, a user is gonna click around all over the place. Is Googlebot actually gonna do that? Well, we’ve seen a lot of instances where they actually are interacting with the page, but we’re not seeing them, you know, click to see all the on-hover states and things like that.
Log files. So unlike Marshall’s haircut there, log files are still in style. I love Marshall Simmonds. But no, he always says you should look at your log files, and I completely agree with that. You find out so much more about what Google is seeing by doing that. And I typically will do it in a very nerdy way where we’ll just parse all the logs and throw it into my SQL and then, like, you know, play around with it till we find things. But there’s a number of tools you can use. In fact, Screaming Frog has a new log file analyzer coming out. I wanted to show it to you, but they told me I couldn’t.
I feel like the search console crawl stats are completely useless to me, because they don’t give me any type of insight into what Google is doing. It just says a lot of stuff happened and then a little bit happened. But there’s a client that we brought on, and they were like, “We don’t know what happened to our site. Maybe it was Penguin. Maybe it was because we turned off our TV ads. Maybe because we ramped down paid search.” But I was able to definitely say, “Well, you have a lot of Googlebot activity and then no Googlebot activity on the day that Penguin came out. What do you think happened?” And we basically layered all that data on top of each other and they were like, “Oh, yeah. We should stop buying those links.” A guy actually got fired over that. I felt really bad. It wasn’t my fault. He bought the links.
So the other thing that we’ve noticed looking at log files is that with the standard SEO thinking is that, you know, the sections of your site that have the most links will be the sections that get crawled the most. But in fact, that’s not the case. What we’re seeing is that the sections that have the most social shares are the ones that get crawled the most. So it’s definitely worth looking at that in your log files to see, you know, what are the crawl patterns looking like so you can determine if you have any spider traps or things of that nature. There’s a great post on the Portent blog where they talk about spider traps and such.
Prerendering is the best practice but can cause problems
So this is actually from a client where they had set up a self-hosted version of Prerender and Prerender, the cache was actually misidentifying Googlebot as a human user. So what that means is they were actually serving the straight up AJAX version, or AngularJS version of the site to Googlebot. And then we looked on the other side to see if these pages were actually getting indexed and they were. But the other thing is that because of the fact that sometimes you have to prepare the cache, and it takes a few seconds, well, Googlebot may think that that’s an outage. They may see that it’s like a 503 or something like that. And then what happens is they will drop that page out of the index because they think it’s not there. But really it is there. It’s just taking a long time to serve it to Google. So that’s another thing you need to think about with prerender. Like, there’s a tradeoff in that sometimes this will happen no matter what you do.
And then finally, what we realized is after we were trying to get over all those outages and things and we couldn’t get it fixed, we said, why don’t we just take off prerender and see what happens. And everything got indexed just fine and their organic search traffic was up 20% year over year. So you don’t need prerender. It’s just nice to have. It’s the optimal way to serve it. But then when you think about it, if I’ve gotta use prerender to make it optimal, why do I even use AngularJS?
So Google actually deprecated their AJAX crawling specification, which is kind of an indication that they can actually crawl this stuff pretty effectively. And the thinking based on, you know, what I’ve seen and what some other SEO’s have seen is that there’s multiple queues as far as how they’re crawling things. So some things get crawled regularly with the text-based crawler and then maybe put into a queue for the headless stuff later because of the fact they may not think they need it. And Google says, “Well, don’t just do prerender for us. Do it because it’s also a faster way to serve your pages.” So what I found is that when you do have Angular, they do crawl faster because of the fact that it’s less computationally expensive for them to crawl those pages.
All right, which brings me to scraping, one of our favorite things to do, right? You know, as SEO’s a lot of our tools are fundamentally built on top of the concept of scraping. And a lot of things that we, you know, analyze is basically based on scraping.
So going back to the idea of a headless browser versus a text-based browser, or crawler, rather, most of those are built with a library called cURL. And this is essentially a way to connect to things on the web. And you download your code for the pages and then you can store it and do whatever you want with it. The problem with those is that they don’t construct the page the same way that Google sees it, or the way that the user sees it. So that analysis is fundamentally flawed because they’re not looking at the page the way that it’s being served. So what you want to do instead is use what’s called a headless browser. PhantomJS is one of the more, the go-to’s or what have you, that people use a lot. But because it’s really complicated and I don’t like spending too much time doing nerdy stuff, I actually use something called Horseman.js, which is kind of an abstraction of that, which is a lot easier to use.
Scrape the unscrapable
So I actually wrote a blog post on how you can scrape from within your browser itself. You can actually go into the console at the bottom and type in some code and scrape stuff off of there. And in fact, you can scrape what’s called the unscrapable stuff using that because effectively, you have access to everything that these other crawlers do not. And if you want to do multipage in browser scraping, you can use something called R2.
All right, so you guys know I have to talk about content, because, you know, that’s what we do, right? So news came out the other day from SMX West that Google looks at entities first when they’re trying to figure out how to rank things. And the interesting thing is that in our industry, we’ve talked a lot about entities, but I don’t see much of that coming out in case studies. I don’t see many people saying like, “Oh, yeah. Here’s how I worked in entities into this content I made.” Like, I don’t really believe that happens. So if you do actually wanna do some entity-based stuff, I would encourage you to check out AlchemyAPI, where you can just basically post your content. It’ll identify entities and then you can optimize for those.
And then also what you wanna think about is term relevance. So there’s a lot of tools for TF-IDF and proof terms and co-relevant terms and things like that. In fact, I would just say look at this post that Cyrus wrote about where he’s talking about the six different things you can do for an on-page SEO that’s more advanced as far as content. And Moz is a new tool that I actually helped with this as well. So within the Moz Pro set up, it’ll tell you the co-relevant terms that you wanna go after to make your target terms rank better.
302’s do not pass PageRank
And then this one, John Mueller, good old John, Johnny Boy. He says that 302’s pass PageRank. So you don’t have to worry about switching those to 301’s. That’s absolute bullshit. So here’s a client that we worked on. They are a sports league. You can pick whichever one it is. And one of the few things that we were actually able to get implemented because of their bureaucracy was 301’s across 12 million pages. And voila, they were ranked better overnight. Their traffic, despite the fact seasonality shows that after their season is over it usually goes down, continued going up into the right. So don’t let anybody tell you, like “Oh, yeah. 302’s are fine.” Nah, bro.
Internal linking structures, I think that this is a huge opportunity for us. And again, I don’t think there’s a good tool despite…I mean, aside from this, what Portent has put together. I don’t think there’s enough focus on the internal linking structure. I would love to see more visualizations like this that say, “Hey, these pages have a lot of links to them throughout your site. These don’t. Maybe this is an opportunity to build more links inside your site.” And Portent has a great report that they do with these force-directed graphs that you can actually purchase of your site so you can get a determination of what that looks like.
So we talk a lot about how Google is becoming the presentation layer of the web, and everybody’s scared they’re gonna take your traffic, whatever, whatever. I disagree. I think that that’s a good thing, because every client I’ve seen get the answer box gets way more traffic than when they’re just number one. So I would say just continue to help them. And I think that JSON-LD actually makes schema.org a lot more realistic. You know, before with Schema.org, you’re just like, “Oh, yeah. Add this little part to this part of code.” And it’s just really tedious and no one wants to do it. Whereas, with JSON-LD, you just put it at the top of the page and it opens up a lot of opportunity because how can they check that that’s actually on the page? I don’t know. Let’s see.
Google expects ludicrously speedy
Okay. Now you guys might think everything I’ve talked about this far has been nerdy. It’s about to get really nerdy. All right, page speed is, like I said, I think that Google is really pushing us in this direction pretty heavily. And they have been for a number of years, but they’ve gotten a lot more aggressive with the specs that they’ve put out recently. So with mobile, they expect that the page, the above the fold section of the page, will load within one second. And because of the latency that, I think, Justin talked about earlier, which is 600 milliseconds, you effectively have 400 milliseconds to serve a page. So it’s very difficult to hit this number.
The other thing that slows down the page is your external resources. So like I said before, the back and forth in the HTTP requests slows things down pretty considerably. And in this example, Chart B, which you’ll always find slows down your site, always slows down your site. You can speed it up by setting something called rel=”DNS-prefetch”.
So the critical rendering path is just like an overarching idea for how to do web design better. And then you have this thing, accelerated mobile pages that they came out with recently. And if you read what it’s supposed to do, it’s really just forcing you to follow the critical rendering path. I feel you. I’m choking out too, man.
Responsive sites are slow – try conditional loading
A lot of people have responsive sites these days, and the thing about responsive sites is that they’re so slow because they basically have to load every version of what the page is gonna look like. So one of the things that you wanna work with your developers on, specifically for this, aside from just the critical rendering path stuff, is also the conditional loading. So basically, you can say, “Okay, don’t load this image because this device will never need it.”
Now the really cool stuff. So there’s a lot of ways where you can speed up the page just by giving the browser a couple of directives. So all these different directives basically make things happen before they need to happen. So I mentioned rel=”DNS-prefetch”. That’s the one where it says pre-resolve. The one that we’re gonna talk about today is called rel=”prerender”, and basically what that does is it loads a page in an invisible tab in the background before the user needs to go to that page. So you specify this and then, you know, you have to essentially guess what page they’re gonna go to and then it’ll say, “Okay, I’ll load this.” And then when they actually click on the link or the button to it, it loads instantly. Not to be confused with Prerender.io, which we talked about before. It’s just a completely different thing.
So Google actually uses this in the SERPs. If you search for CNN, they guess that more often than not, whatever the user is gonna click is going to be CNN.com. So they prerender that in the background so when you click that link, it shows up immediately. So what I did was a test to prove that this actually makes things faster. And I used a headless browser to send thousands of visits to a series of pages to see how that affected performance. And generally speaking, the rel-prerendered pages performed better than the pages that are not using prerender. So the way to make this work, because, like I said, you can only specify one URL that can be prerendered, you can use the Google Analytics API to determine what it the most likely next page that users go to, and then specify that as your rel=”prerender”. Here’s your code. You got it, right? So doing that with one line of code, we effectively sped up our website 68.35%. It’s pretty cool, right?
So caveats, don’t use it on mobile. If you’re not using Google Analytics, it’ll show fake sessions unless you account for that. So there’s something called the Page Visibility API, which basically will say, until this page is actually in front of the user, don’t fire this code. So you can basically set your Google Analytics to only fire, or whatever analytics package you’re using, to only fire once the user actually sees the page. Google Analytics automatically accounts for this. And then there’s also Rel=“preload”, which I haven’t checked out yet, but I know the Portent team is using it on their site. And basically, you can preload things on the same page, whereas Rel=”prerender” allows you to preload the next page.
Moving forward with the technical SEO renaissance
So we talked about a lot of nerdy stuff. Here’s your takeaways, what you should actually think about doing. First thing you wanna do is understand the document object model, which is basically the building blocks of how the web actually works. Then you wanna also understand the critical rendering path. How are pages constructed? How can you make them faster? And then page speed, and again, I keep giving shout outs to the Portent team. They have a great guide on how to make your site faster. You know, it’s the ultimate guide to page speed, of course. And then, log file analysis. You guys should definitely bring that back in style, unlike Marshall’s haircut.
And then finally, get back to testing stuff. I feel like, as of late, we all just, you know, assume things are the way that people say they are. And I feel like SEO used to be so much built on this culture of like, testing everything. Like, no matter what I just said to you, “Oh, you got 70% improvement.” “Yeah, let me test that.” We need to get back to that, because that’s how we continue accelerate our learning, because obviously Google isn’t gonna tell us anything for real. So let’s get back to testing. And content, cool. You should continue to do content. It’s really important, but let’s remember who we are as an industry, what we’re really, really good at, and that’s making things visible. And I just wanted to put Michelangelo, because I said Renaissance. That’s all I’ve got.