How do you scrape websites that use services like Brassring?

How do you scrape websites that use services like Brassring (e.g., GE Careers)? Also, how do you use the scraping to navigate to separate pages?

  • Brassring uses data taken from a database that seems a bit harder to scrape than by using the parameters of readLines and RCurl packages in R. What methods (preferably in R or Python) would you use to scrape it and could you please share your code?

  • Answer:

    I wrote a distributed crawler for just these cases; it is open source https://github.com/CalculatedContent/cloud-crawler it is ruby

Charles H Martin at Quora Visit the source

Was this solution helpful to you?

Other answers

I came to a similar problem when I was creating my own program to search for jobs. One of the sites, Monster, changed their front-end code, and made it much more difficult for me to parse information from data gathered using Requests (I parsed with BeautifulSoup). I ended up having to scrap the project and start from scratch. Here's when I found Selenium( http://selenium-python.readthedocs.org/ )It uses your web browser, Firefox or Chrome to go to a site: from selenium import webdriver from selenium.webdriver.common.keys import Keys driver = webdriver.Firefox() driver.get("http://www.ge.com/") ... or you can run it headless (no browser visuals) with virtualization using xvfb or with phantom.js. driver = webdriver.PhantomJS() You can program Selenium to click on various elements to navigate yourself to where you want to be, and there are functions that let you scrape data. Here's a primer you can use that will let you hit the ground running:https://automatetheboringstuff.com/chapter11/ Likewise, you can also treat it similarly the way you would with a Requests module and just ask for the link (at least with the GE website).For example: driver = webdriver.Firefox() driver.get('http://www.ge.com/careers/opportunities?keyword=&country=United+States&state=California&func=Asset+Management&business=TG_SEARCH_ALL&experience_level=Co-Op%2FIntern') Now, Selenium isn't perfect. It uses a lot of resources on your computer. If you're doing a massive project, this might not be the best module. However, this is something I definitely recommend you check out as a possible alternative solution to your problem.If you're interested to see how I've used Selenium with http://Monster.com, http://SimplyHired.com, and http://Indeed.com, here's my github: https://github.com/michaelverano/AutomatedTools/tree/master/searchJobs2 . As a warning: I don't see myself as a computer programmer (yet), so don't expect the best most efficient code., but perhaps it can give you an idea of how to approach your problem.Good luck!

Michael Verano

Web scraping can retrieve both static and dynamic web pages that can help in the long run. There are numerous scraping tools available online that provide excellent services and among them one of the most popular ones is Easy Data Feed. Powered by ShoppingCartElit, the Easy Data Feed is a data extraction software that is designed to download quickly inventory, pricing and product information into a usable spreadsheet from your drop ship supplier̢۪s online portal without relying on the drop shipper. It has been specially built for online retailers who are dissatisfied with their drop ship supplier's digital data for inventory, pricing and even universal product information. Disclosure: I am a specialist of ecommerce platform for businesses.

Junior Johnson

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.