how to regex remove this?

How to create a regex conditional match?

  • I am trying to create a regex conditional match. A portion of the regex match should be reused to further match against the string. The following example: "sometext http://www.somedomain.com/test.html somemoretext http://www.someotherdomain/www/hello/more.html someothertext" I would like to match on the domain portion of the url http://www.somedomain.com and determine whether the following urls originate from the same domain. if not, I would like a match to occur. So: "sometext http://www.somedomain.com/test.html somemoretext http://www.someotherdomain/www/hello/more.html someothertext" <- this is a match "sometext http://www.somedomain.com/test.html somemoretext http://www.somedomain.com/www/hello/more.html someothertext" <- this is not a match Any regex gurus that know the answer to this one out there?

  • Answer:

    I don't entirely understand your question but I'll give it a shot, and then refine and post later again if necessary. Lets start simple, I'm guessing you want to match the 'http://www.domain.com' part when text can appear after or before. So in Perl/CGI, this would be # firstly we create the domain variable to store the url my $domain = "http://www.domain.com"; # then for fun we create the url variable which contains a string with our domain my $url = "sometexthttp://www.domain.comsometext"; # now we create an if statement which attempts to verify whether the url variable contains our domain if ($url =~ m/(.*)?$domain(.*)?/) {do this} I haven't tested this but as far as I know it should work :) I'll explain what the if conditional does above step by step, in this case where trying to match the domain to this other string. 1. so if (condition is true) {do this} 2. $url is the variable that contains string of text that in turn may or may not contain the domain where looking for 3. the =~ sign is a binding operator 4. m/specificcontent/, the m stands for match and were trying to match specific content against the $url variable 5. the dot (.) represents any character which includes letters, numbers, symbols, and so fourth 6. the asterisk (*) applies it self to the character before, which means in this case that you can have an unlimited amount of any characters before the $domain. Now this can be a problem for processing since it doesn't actually know when to stop in a sense. I can't really explain this very easily. 7. the question mark (?) indicates that the character before is optional 8. If we put these all together you get (.*)? and what this means is that what is in brackets is optional, so in our matching string it means there can or can't be any text before or/and after the domain where looking for, its a choice thing. Besides actually talking to you via chat with any questions you might have related to this its actually quite difficult to explain further. Hope this helps :) Good Luck

christia... at Yahoo! Answers Visit the source

Was this solution helpful to you?

Other answers

At first you find the entrypoint in the text starting with http.//www. then you split from the starting point the domain and assign it to a string variable. Use this string variable in Regex to check if the next Domain is similar or different. Thats all Helmut

hswes

Related Q & A:

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.