How to Get JSON from External URL?

Does Reddit's JSON API have undocumented artificial limits to prevent scraping?

  • It would appear that the JSON API returns very different results than the browser. Put this URL in your browser and look at the results, then try it with API Kitchen, Curl, Mechanize, etc http://www.reddit.com/r/guitar/new/.json?limit=100 You get 100 results with the browser. Using the non-browser methods of retrieving it gets you 1-2 results. Is this a bug, or intentional design to limit what web crawlers gather from Reddit? On larger subreddits, it makes for incredibly inconsistent results, and the "after" parameter is inaccurate then for paging, resulting in a ton of duplicate results. Yet, I can't find any documentation indicating that this is intentional and not a bug. If there are limits, that's cool, I just want to know what they are so I can respect them properly in my code.

  • Answer:

    Are you passing your cookie when you use the API?  If not, then your results will be different, because you're getting a different experience.  Since I'm guess you aren't passing your cookie, you're getting the "rising" page instead of the "new" page. Try this instead:  http://www.reddit.com/r/guitar/new/.json?sort=new There is no hidden limit.  The only limit is the published limit of not hitting the same endpoint more than once per 30 seconds. Also, you'll get much better performance *not* passing your cookie, so I'd suggest you don't pass it unless you have to.

Jeremy Edberg at Quora Visit the source

Was this solution helpful to you?

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.