How PACKAGE argument for .External/.External2 works?

Perl regex hell

  • I have been hammering Google for about three hours now trying to find the answer to what I expected to be a simple question: WTF is up with regular expression backreferences in Perl? I cannot make this (simple) find-replace work. Trying to automate some file processing, stripping out junk in text output files from a data collection program and turning them into .csv files. I was doing this via an AppleScript, calling TextWrangler to do the grunt work using grep regex find/replace, and it worked well - but it was really slow once I compiled the AppleScript, so I am trying to use a call to a shell script instead to speed things up. The code in question: sub replaceString {   my ($search, $replace) = @_;     if( s/$search/$replace/ig ) {         print;     }     else {         print;   } } When I pass it text as the search and replace arguments (ie, find "foo" and replace with "bar") it works. When I pass it arguments using backreferences, it doesn't work, even though the same argument in TextWrangler works just fine. Specific example: I need it to strip an initial 0 off of some timestamps in the file. If I use "\ +0(\d\d:)" and "\1" for my find and replace values, the zero is replaced by "\1" - literally. "045:" is changed to "\145:". It doesn't work using dollar signs either. I've spent most of my afternoon trying to figure out how to do this, but every damn result I find in searching just tells me how to use regex, not how to pass regex group elements from one match to the next. Perl apparently just forgets all regex values the second it starts interpreting a new match? I can't figure out why a program designed to work with text is so obstinately stupid when it comes to something as simple as a find-replace using pattern matching. There must be a really simple way to do this. (Notes: the subroutine will be called to replace a long string of matches, sequentially, so I need to be able to keep $search and $replace as undefined args passed by the main routine. In the end I need to be able to package this as a bundled app that anyone in my lab can run on the lab Macs [the reason I compiled the script], so I can't get too crazy with how this thing works.)

  • Answer:

    Specifically, this works for me:sub replaceString { my ($s, $r) = @_; if (s/$s/$r/eeig) { print; } else { print; }}$_ = "045:\n";replaceString qr/0(\d\d:)/, '\1';

caution live frogs at Ask.Metafilter.Com Visit the source

Was this solution helpful to you?

Other answers

At least in Perl, you need to use $1 instead of \1.

jozxyqk

I'm taking a stab in the dark here, and guessing the \1 is only a valid back reference when naked, and becomes literal text when passed in as part of a text string. I can't elaborate on why that would be, but it fits your observations and seems somewhat reasonable. If that's the case, you can probably fix it by creating a literal text string of your s/// construct by using your passed string values, and then running it through eval().

devbrain

Use '$1' instead of '\1' and do s/$search/$replace/eegi instead. Also, I hope your replaceString is a little more complex. Otherwise it would be just as easy to do the s/// instead of making the function call.

sbutler

Arg... not \1. $1. $1. $1.

sbutler

I see some other folks have chimed in while I was writing a small snippet to demonstrate that eval works. Other changes -- I've changed your "\ " to a "\s" which I think is more readable. You might be better yet with a \b to match a word boundary - do these leading zeros never occur at the start of a line, but always after whitespace? Your regex as given will strip the leading zero AND leading whitespace -- if that's not what you want then use \b. The \1 is better written as $1 (per perl's warnings) There's no need for a then/else to both contain the print #!/usr/bin/perl -w sub replaceString { my ($search, $replace) = @_; my $str = qq! s/$search/$replace/ig; !; eval $str; print; } while () { replaceString('\s+0(\d\d:)', '$1'); } >

devbrain

Also, I'm not sure I understand how you expect things to work here: I've spent most of my afternoon trying to figure out how to do this, but every damn result I find in searching just tells me how to use regex, not how to pass regex group elements from one match to the next. Perl apparently just forgets all regex values the second it starts interpreting a new match? If you mean inside of $replace, then you're misunderstanding the terminology. What's in there is not another match, it's a replacement, and $1/$2/etc are certainly valid in this context. If you mean at the start of another s/// or m//, then yes, of course they get reset. That's the only reasonable thing to do. If you want to save $1/$2/etc from a previous match then you need to do that yourself. It's pretty easy:push @matches, [$0,$1,$2,$3,$4,$5,$6,$7,$8,$9];Then the first capture from the previous match is $matches[-1][1];

sbutler

@devbrain while that might work, I have to disagree that it is a good solution. eval on strings should be used sparingly and as restrictively as possible. And in this case, the perl developers added an explicit option to s/// to solve the problem. From perlops:e Evaluate the right side as an expression.ee Evaluate the right side as a string then eval the resultClearly they intended for us to use 'ee' in this case.

sbutler

If you're just stripping, you can use positive lookahead:replaceString('\s+0(?=\d\d:)', '');

rhizome

FWIW, within a pattern, use \1 as a backreference to an earlier part of the same pattern, to match a duplicate of something that an earlier group matched. Within a replacement or just plain ol' Perl code, use $1, which is a variable holding what a group matched. Subtle difference (and muddied by the fact that sed uses \1 for both). As for why it's not doing what you want, the answer is the 'ee' suffix that sbutler and devbrain mention. It would be really really annoying if Perl did what you wanted all the time; you normally don't want the contents of text variables to be re-parsed as Perl code repeatedly. If there happens to be a $ in your replacement text you don't want that to get interpolated as a perl variable. Usually.$a = 'foo $1 bar';$b = qr/"(.*)"/;$c = 'I say "ok"';print "Before: [$c]\n";$c =~ s/$b/qq(qq($a))/ee;print "After: [$c]\n";The 'e' suffix evaluates as perl code, but what you want to do is two rounds of string interpolation, so you wrap the RHS in two doublequote operators. An IMHO less weird and more perlish way to do this would be to pass in a code reference to compute the substitution string:$a = sub { "foo $1 bar" };$b = qr/"(.*)"/;$c = 'I say "ok"';print "Before: [$c]\n";$c =~ s/$b/&$a/e;print "After: [$c]\n";The big benefit of doing this is that you don't make any assumptions about the content of the replacement string— in the first example, if $a had a right-paren in it, it could mess up the second eval. In the second example, there's no multiple-eval weirdness going on, just a subroutine call.

hattifattener

Related Q & A:

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.