How do I do cron job for list of URLS?

I want to suck my files!

  • Unix cURL syntax: multiple URLs to multiple files (download) I've read the man pages, I've tried an amazing number of things to varying failure. Minimal success or I wouldn't be asking. I want: exaple.com/red exaple.com/blue exaple.com/orange exaple.com/purple [...] to be downloaded as red.html blue.html orange.html purple.html I can go through the list of things I've tried, but in the end I either get just the first file, or I get all the files without extensions. It seems idiot simple, so it seems I am less than an idiot. OS: Lion, but would like to eventually script and cron this. This is for pulling hard copies out of my own CMS (ExpressionEngine), so nothing nefarious.

  • Answer:

    Use `wget`. This standard Unix utility isn't included in OSX AFAIK because of Apple's dislike for GPL-licensed software, so you'll have to install it yourself (the instructions to which are readily googled). You can supply multiple URLs or use the `-i` option to read a list of URLs from a file.

cjorgensen at Ask.Metafilter.Com Visit the source

Was this solution helpful to you?

Other answers

I would put my list of URLs to download in a file named URLs, and do something likecat URLs | while read url; do curl -o `basename "$url"` "$url"doneThat uses shell to loop over the URLs. Using basename for the file name isn't awesome. Often I prefer wget to curl, simply because it has a decent default filename.

Nelson

Can't you just rename the files after they're downloaded? Have a bash script that looks something like this:#!/bin/bash#curl command(s) here, assuming we write to a directory called 'example.com'for file in ./example.com/*; do if grep -qi html "$file" then mv "$file" "$file.html" fidoneIf you need recursion you can adapt find to the task at hand, but that's the basic idea.

axiom

I'm not opposed to something other than curl, but I am going to do this from a dynamic site. So I would be generating that file on the fly. It won't be static. Is that a way to cat the content of a URL? I could pull that info way easy. Basically, I want to download every entry on my blog as an individual static page (from a template specifically designed for this).

cjorgensen

axion, I can rename after downloading. If that's the easier option I'll take it. The filenames would just need .html appended.

cjorgensen

If you're trying to archive your blog, that's basically what http://www.gnu.org/software/wget/manual/html_node/Recursive-Retrieval-Options.html is for. There are a variety of wget options for MacOS; I installed mine via homebrew.

Nelson

You can use bash's word expansion to make this simple:while read -r url; do curl -so "${url##*/}.html" "$url"; done <file_with_urls.txtChange the final ; to a & and you automatically turn it into a parallel job, where all the files are downloaded simultaneously. (Don't do this if you have a ton of URLs in the file.)

Rhomboid

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.