Scraping ascii file with redirected URL using Perl
-
I am trying to scrape a page that is generated on the fly by a webserver. Consistent data is submitted (by me) to the CGI on the target machine, then a report is generated on the fly and a redirect is issued. I need the script to obtain and follow this redirect, match a "ascii download" link on the target page and "click" this link to download. I was thinking of using WWW:Mechanize to achieve this. A full code example would be required.
-
Answer:
Hello grabby-ga There are few exact details in your question so I will give a generic perl script that should hopefully allow you to modify it and generate the solution you require. This script will: 1) Go to the page and retrieve the redirect 2) Download the redirect page 3) Match the text link 4) Download the text link into a file #BEGIN #!/usr/bin/perl # modules to use use LWP::UserAgent; # this is the url of the page that gets redirected $url = "http://redirectedurl.com"; # get this redirected page and find out where it is redirected to $ua = new LWP::UserAgent; $request = new HTTP::Request HEAD => $url; $response = $ua->request($request); $url = $response->request->url; # download the redirected page $browser = LWP::UserAgent->new(); $response = $browser->get($url); $page_content = $response->content; # use a regular expression to match the text file # this will need to be nailed down more to ensure the link is correct # but this is difficult to do without seeing a copy of the page it is on $page_content =~ m/http\:\/\/(.*)\.txt/ ; $text_link = "http://" . $1 . ".txt"; # download the text link page $browser = LWP::UserAgent->new(); $response = $browser->get($text_link); # save the output open(OUT,">config.txt") || die $!; print OUT $response->content; close(OUT); # end the program exit(0); #END If you have any questions or need some more help in adapting this to your situation please ask for clarification and give as much further information as you can for your exact requirements.
grabby-ga at Google Answers Visit the source
Related Q & A:
- How to upload a file to another server using cURL and PHP?Best solution by Stack Overflow
- How to rewrite URL using htaccess?Best solution by Stack Overflow
- How to rewrite a URL using htaccess?Best solution by Stack Overflow
- How to crawl same url using Scrapy?Best solution by stackoverflow.com
- How to get current YouTube video url using JavaScript?Best solution by Stack Overflow
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.