How to crawl same url using Scrapy?

How would you disallow a dynamic URL parameter in robots.txt?

  • Lots of documentation online about how to block /?q= but nothing conclusive. My worry about using /*?q is other parameters that begin with the letter q that we don't want in the no-crawl list. Example URL (how would you disallow it): example dot com/reg/Products/WSLTDAS?q=::devicePath:/Device/SRSG

  • Answer:

    /*?q=* would be my first choice. I'm 99% sure the trailing * is me showing my age, and is totally redundant.

Ian Lurie at Quora Visit the source

Was this solution helpful to you?

Other answers

Disallow: /*?q=

Arjan Bakker

How about using Disallow: /*?q= . This will block everything that has parameter ?q= in the url. If you want to block urls with ?q=::devicePath and want to allow others such as say ?q=product then use Disallow: /*?q=::devicePath. Does this help?

Nikhil Raj

Related Q & A:

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.