Рow to find the ordinal variables in a .csv file in python?

Help a noob out.

  • Programming challenge: pull XML data with an HTTP GET, process and/or lightly transform it, and output a csv file. Please help me figure this out in Java or tell me what other language to use. I'm trying to pull XML data down from a REST database, process it, and then spit out csv's. Using Java I've gotten as far as pulling the XML down from the server into a Document object, which I figured out how to print out as text. Now I need to process some of that data; simple transformations like taking a field from "AA-BBB-CCCC" into three separate fields "AA", "BBB", and "CCCC". Another task is changing all the underscores in a certain column into hyphens. After that I want to write the output to a csv file. If it would be easier to convert the XML to CSV and then process the CSV in Java, that's fine with me as well. When I've been googling and stackoverflowing Java and XML, I find stuff like http://www.cafeconleche.org/books/xmljava/ which seems super heavyweight and complicated. Is there a simple way to do these kinds of transforms in Java, and write out to CSV? If MeFites could point me to any resources or tutorials on this, I would be very grateful. Also, if Java is not the best place to be trying to do this, I am open to other languages (Python? Groovy?) as long as learning them will be fairly quick. Other info that may be helpful: these are small files, less than 100 rows. I have a small about of experience with SQL and Java but am not a real programmer. I had to redownload Eclipse and google "how to add jar file" in order to get this far.

  • Answer:

    Aizkolari, I just tested nickwolff's Perl on the latest Strawberry Perl (5.14.2.1) on WinXP from a fresh Virtual Machine install. A couple of steps and changes and it works fine. Download/Install Strawberry Perl (I used the .msi installer) Open a cmd shell and you need to install the Text::CSV module (can't believe this wasn't included, the other 'harder' modules like XML::Simple were.) Type 'cpan Text::CSV' to install the missing module. Then since Strawberry Perl is the latest 5.14.2 (and I dislike using -l), change 'use strict;' to 'use 5.014;' and change the 'print' to 'say'. (The '-l' does automatic line-ending stuff, should strip the EOL on input and add one with print on output, but since the script fetches data using LWP::Simple and 'say' is print with a EOL... the '-l' is just baggage. The whole #! line may be baggage on Windows AFAIK.) #!/usr/bin/perluse 5.014;use LWP::Simple;use XML::Simple;use Text::CSV;my $csv = Text::CSV->new;# perform HTTP GET request and parse the XML content# into a Perl data structure.my $xml = XMLin( get "http://feeds.delicious.com/v2/rss" );for my $item ( @{ $xml->{channel}->{item} } ) { # each item in array of items $item->{title} =~ s/_/-/g; # replace '_' with '-' globally in the title my @parts = split '/', $item->{link}; # split the url into pieces on '/' $csv->combine( $item->{title}, @parts ); # make a proper CSV line with quotes if needed. say $csv->string; } Create and run the script. C:\....\Downloads> perl 211500.pl"Subtle Patterns | Free textures for your next web project",http:,,subtlepatterns.com"Delicious.com - Discover Yourself!",http:,,delicious.com,help,quicktour,chrome"Dart : Structured web programming",http:,,www.dartlang.org"Kern Type, the kerning game",http:,,type.method.ac"Wordle - Beautiful Word Clouds",http:,,www.wordle.net"COLOURlovers :: Color Trends + Palettes",http:,,www.colourlovers.com"TED: Ideas worth spreading",http:,,www.ted.com"Khan Academy",http:,,www.khanacademy.orgdafont.com,http:,,www.dafont.com"YouTube - Broadcast Yourself.",http:,,www.youtube.com You can use 'perldoc XML::Simple' to read the documentation for that and other modules. A handy module to install is 'Data::Dump' which gives you a nice 'dd' command to dump data in a readable format. ...use Data::Dump;...dd $xml;...dd $item->{title};...

Aizkolari at Ask.Metafilter.Com Visit the source

Was this solution helpful to you?

Other answers

Quick Perl example on a random XML feed, showing all the features you mention needing: #!/usr/bin/perl -luse strict;use LWP::Simple;use XML::Simple;use Text::CSV;my $csv = Text::CSV->new;my $xml = XMLin( get "http://feeds.delicious.com/v2/rss" );for my $item ( @{ $xml->{channel}->{item} } ) { $item->{title} =~ s/_/-/g; my @parts = split '/', $item->{link}; $csv->combine( $item->{title}, @parts ); print $csv->string; }

nicwolff

This would be trivial in Perl with the REST::Client, XML::Simple, and Text::CSV_XS modules. Perl was made for text transformations on data.

nicwolff

You're in good shape so far. You've got two basic philosophical choices here. 1 - You can walk through your xml data, pick out the elements you want to do stuff with, and write the CSV rows to a StringBuilder. This might be easier to wrap your head around. For small files, a DOM parser might be your best bet. 2 - Or, you can use XSLT to change the XML document directly to the CSV that you want. It's a bit more abstract, and the XSLT language is a beast unto itself. But, it does give you a terse and powerful way to do what you're looking for, with less lines of total code. I kind of wish I knew what the full ask is, so I can say one way or the other which is the best way to go. Anyway, since you say you already have a document in memory, this is a good place to start for option 1: http://docs.oracle.com/javase/6/docs/api/org/w3c/dom/Document.html Something like Document.getElementsByTagName(), along with a loop, might just get you the rest of the way. Good luck!

Citrus

Putting in a plug for the http://groovy.codehaus.org/ programming language, which has taken this non-programmer further than I thought I could ever go. The syntax is amazingly intuitive: even if you don't know exactly what you're doing, a lot of times you can enter some parens or some other character that seems to make sense and lo and behold, it does what you want it to. XML processing with Groovy, in particular, is pretty pain-free. There is a great http://docs.codehaus.org/display/GROOVY/Mailing+Lists (user at groovy dot codehaus dot org) on which I frequently post "newb, please help" type questions and I always get at least one helpful answer. If you really want to learn programming, I've heard that it's best to start with Java or C/C#/C++ and go on to the "syntactic sugar" languages like Groovy, but if your job doesn't require you to be a full-fledged programmer, a language like Groovy can help you immensely. Python is also great, but be prepared to wrestle with getting packages if you are behind an NTLM (Windows) proxy at work and there aren't any Python programmers at your job to help you out. I haven't found that to be the case with Groovy.

Currer Belfry

Thanks for the updating and commenting zengargoyle! Aizkolari, is the Perl script clear to you now? Whatever solution you use, please don't try to make valid CSV "manually" by joining with commas as in substars' Ruby example – use some kind of CSV library. Otherwise a comma or newline in the data will break your script, and then you'll try surrounding all the values with double quotes, and then a double quote in the data will break your script...

nicwolff

Oh, you might want to play with the 'lwp-request' program that's installed with Strawberry / LWP::UserAgent. See: perldoc lwp-request...> lwp-request -m GET -H "ACCEPT: application/xml" ... https://...It's a simple CLI wrapper around LWP::UserAgent so you can play with flags and see results before/while hacking on your code. And your previous try with HTTP::Request would go something like:$res = $ua->request($r);$xml = XMLin( $res->content );

zengargoyle

Yeah, +1 to nicwolff and mea culpa; was just trying not to confuse things with a bunch of libraries. Ruby has FasterCSV, which is analogous to Text::CSV in Perl.

substars

See: LWP::UserAgent The non-Simple version of the doohickey.use LWP::UserAgent;my $ua = LWP::UserAgent->new;my $res = $ua->get( "http://localhost", "ACCEPT" => "application/xml", "KEY-HEADER-NAME" => "foo",);if ( $res->is_success ) { say $res->decoded_content;}else { die $res->status_line;}exit;perldoc will tell you what you need to know, LWP::UserAgent is the full web-client thing with user agent strings, cookie-jars, headers, SSL certs and the like. Here is the stuff the above sends to a server: # nc -l 80GET / HTTP/1.1TE: deflate,gzip;q=0.3Connection: TE, closeAccept: application/xmlHost: localhostUser-Agent: libwww-perl/6.04KEY-HEADER-NAME: foo Nitpick time...# 2-arg open is frowned upon (FH is a global here)open FH, '>>Output.csv' or die "$!";# -> 3 arg open with local varsopen my $outfile, '>>', 'Output.csv' or die "$!";print $outfile $csv->string, "\n"; # you can use , vs . here, print takes argument list.close $outfile or die "$!"; # if you're not checking for errors like full filesysem...# Perl will close the file automatically at exit.# optional: replace checking with 'autodie' pragma.use autodie;open my $fh, '>', 'Outfile.csv';# dies with "$!" on error automagically...# you may need to turn off ssl host verification for local certs$ua->ssl_opts( verify_hostname => 0 ); # see perldocLooks like you found the lower level objects used: HTTP::Request, HTTP::Response...

zengargoyle

I'll add that it may be helpful to know about the MetaCPAN site. Pretty much every Perl module somehow comes from the CPAN and you'll find decently formatted documentation: https://metacpan.org/module/XML::Simple And you can look at some of the files that aren't installed with the module but are part of the distibution as a whole. Since Perl is generally awesome when it comes to testing, you can get a bunch of example usage by browsing some of the tests: https://metacpan.org/source/GRANTM/XML-Simple-2.18/t/1_XMLin.t And the while you can use 'perldoc' in a terminal, often the HTML version on MetaCPAN is a much easier read with all the linking and syntax-highlighting. The core Perl documentation is also available: http://perldoc.perl.org/perl.html The first 3 tutorials may be of help in digging out your data from $xml. perlreftut Perl references short introduction perldsc Perl data structures intro perllol Perl data structures: arrays of arrays If you get lost along the way memail me. From the original description, what you're trying to do isn't that hard, but the evil of XML and deep data structures may not be the easiest first Perl hackery.

zengargoyle

Related Q & A:

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.