How to crawl same url using Scrapy?

When implementing a site-wide URL update, should on-site links be updated to the new format immediately, or should you wait a couple weeks for Google to associate the new URLs via the Canonical link element?

  • When you have to update/change just about every URL on your site, and you're using 301 redirects to send traffic going to the old style URLs to the new version of that URL, I'm wondering if it's best to update on-site links (like those in a top nav that spans the entire site) to the new-style URLs immediately, or is it better to leave those links alone for a couple weeks and allow Google to crawl them, get redirected and see the new canonical link and understand that a link that used to look like "abc" is now "123" but still points to the same page/content?

  • Answer:

    Having dealt with a few site migrations I can sympathize with the desire to handle the transition with as few problems as possible.  To that end I'll offer some advice that covers both best practices and a testing approach. The 301 will be a far stronger signal to google than the canonical tag on the pages so either approach will result in a similarly timed and structured update.  Leaving your links pointing to the previous url if nothing else dilutes the link signal going to the target page. If feasible testing using a variation of the second approach may be your best option.  I'm not sure that your CMS/Web App will allow you to do this but here's a testers approach to this question...some of the inspiration for this comes from a Wil Reynolds (SEER) talk I attended a couple years back. 1. Pick a few urls that rank decent on reasonably competitive terms from among those you wish to redirect, and record their ranking/last cache by google. 2. Institute a 301 redirect/internal link update on just those pages to the new pages. 3. Check back every few days and note whether the destination url in the serps you started tracking in step 1 have changed. This approach will do two things for you, first it will help you better predict what will happen to the rest of the site when you apply a similar treatment to it. There may be fluctuations in ranking as Google's many distributed indexes handle the update etc.  Second, and possibly more importantly in these days of very complex web applications, it will allow you to determine if you've made any mistakes in your implementation of the redirect you intend to implement sitewide.

Sam Peck at Quora Visit the source

Was this solution helpful to you?

Other answers

I don't see any benefit in leaving the internal links pointing to the old URLs. By switching over to the new URLs, you tell Google that the site has changed and any external links pointing to old URLs will be 301'd. This approach is most common. Google handles site infrastructure changes all the time. Once they find the new homepage it will crawl the new links structure and remap your site. If you leave the links in place with 301's you simply lose a bit of your internal site's PageRank unnecessarily. In fact, Google recommends updating your site structure to correct canonicalization and duplicate content issues over 301 redirects. So I think they would prefer the architecture be clearly defined as soon as possible, and not delayed weeks.

Dan Cristo

hi Dillon, an informal survey of the Kenshoo founders suggests the latter option, although the confidence level in the answer is only moderately high (70% or so.)

Jason Pratt

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.