Regex: Select all from row except

Vasile Caraus

hello, I have this kind of rows:

<li><a hrefs="love-and-attitude.html" title="Love and Attitude">Love and Attitude (24)</a></li>
<li><a hrefs="paint-and-gain.html" title="Paint And Gain">Paint And Gain (15)</a></li>
<li><a hrefs="mother-and-father.html" title="Mother And Father">Mother And Father (19)</a></li>

The desire output:

Love and Attitude (24)
Paint And Gain (15)
Mother And Father (19)

Scott Sumner

@Vasile-Caraus

You have done so many regex question postings over a period of years. This one appears very simple, so I would ask you to demonstrate your learning by first telling us what you’ve tried (and apparently failed with). From there we will steer you in the correct direction, rather than just doing the work for you.

Vasile Caraus

I am not a programer, just a designer, this is why is hard to make everything possible :(

Scott Sumner

@Vasile-Caraus

I am not a programer, just a designer, this is why is hard to make everything possible :(

??

Is this supposed to be some sort of excuse for not learning things? Please don’t waste the Community’s time to hand-hold you through every regex you ever need to write. Remember that the Community really isn’t a great place to discuss (basic) regex anyway…there are better sites for that.

Vasile Caraus

Scott, if you knew better, you could have helped me already :)

PeterJones

@Vasile-Caraus ,

To say to Scott “if you knew better…” is ridiculous, given the number of times he’s answered regular expression questions in this forum.

Given the data you showed, it’s nearly trivial, assuming there’s never a “<” in the data that you want as part of the output.

Scott asked you to show what you’ve tried. Barring that, after taking the advice which I’ve iterated many times in this forum:

FYI: if you have further regex needs, study this FAQ and the documentation it points to. Before asking a new regex question, understand that many of us will expect you to show what data you have (exactly), what data you want (exactly), what regex you already tried (to show that you’re showing effort), why you thought that regex would work (to prove it wasn’t just something randomly typed), and what data you’re getting with an explanation of why that result is wrong. When you show that effort, you’ll see us bend over backword to get things working for you. If you need help formatting the data so that the forum doesn’t mangle it (so that it shows “exactly”, as I said earlier), see this help-with-markdown post, where @Scott-Sumner gives a great summary of how to use Markdown for this forum’s needs.

… after taking that advice, it should be easy to come up with a guess as to what to use, and be able to explain why you made that guess. If it didn’t work, and you showed that effort, you might be surprised at how much more helpful we’ll be.

If you are incapable of doing that (“because you are not a programmer”), then you could at least describe the steps you would take if you were manually extracting the data: “first, I would look for an <a ...> tag, then start matching data, then end matching upon reaching the first </a> closing tag”

In another piece of helpful advice: parsing HTML or XML using regular expressions is fraught with dangerous, and nearly an exercise in futility. It’s probably much better to find a purpose-written utility for extracting data from HTML/XML.

That said, I’ll take pity on you: for your simple case, I will base64-encode a regular expression that will work. If you can show enough effort to go search out a utility that will decode base64-encoded data, you will have all you need to be able to produce your desired output from the input you gave. (Note, you gave no restrictions, and no indications of when data might not match, so I made it pretty wide open: it will match even more than my earlier textual description of how to solve the problem.)

RklORDogKD8tcyleLio+KFtePF0rPyk8Ly4qJApSRVBMQUNFOiBcMQoKcmVnZXhyLmNvbS80MXJp
bCB3aWxsIHNob3cgaXQgaW4gYWN0aW9uICh0aG91Z2ggaXQgbG9va3MgYSBsaXR0bGUgZGlmZmVy
ZW50IHRoZXJlLCBiZWNhdXNlIG9mIHRoZSBpbnRlcmZhY2UgcmVzdHJpY3Rpb25zIGF0IHRoYXQg
d2Vic2l0ZSkK

But honestly, your refusal to even give an attempt, and then use the lame excuse of “I am not a programer”, makes me wonder why you expect people to actually help you. We’re not here to do your job for you for free. We’re here to help people get the most out of their Notepad++ experience. Some of that help includes showing people where to go to learn how to do the tasks themselves, rather than expecting someone else to hand a solution on a silver platter.

Good luck on getting future help.

Scott Sumner

@PeterJones

base64 encode…LOL… +1

Scott Sumner

@PeterJones said:

If you can show enough effort to go search out a utility that will decode base64-encoded data

…or do what I did and rack my brain trying to remember how to get Notepad++ itself to do it (which I finally was able to do) !

PeterJones

Honestly, I was expecting the OP to just google and click on the first link found, which would have done the trick, too. But doing it inside Notepad++ is even better.

I had forgotten that option was there in that menu. There are a lot of good features either built in or added to Notepad++. I knew there was a reason I love this editor. :-)

guy038

Hi, @vasile-caraus, @scott-sumner, @peterjones and All,

Scott and Peter, I do understand and agree to the general philosophy of your replies. Besides, I’m a bit surprised of Vasile’s request as he have already been able to create regexes more complicated that the one needed for that topic !

To be honest, just admit that Vasile clearly identified the BEFORE text and the AFTER desired text :-)

So, I’ll try to solve his problem, using a bit of logic, just to demonstrate that, as always, when a problem is well posed, the solution often becomes obvious !

So, using your first sample line :

<li><a hrefs="love-and-attitude.html" title="Love and Attitude">Love and Attitude (24)</a></li>

Select any > symbol and use the menu option Search > Mark All > Using 5th Style
Select any < symbol and use the menu option Search > Mark All > Using 2nd Style

Note that I have chosen these colours, on purpose, with reference to traffic lights : Green => Go and Red => Stop !

By examining the line, you’ll clearly see that the only gap beginning with a green style and ending with a red style, which contains some characters inside, is the string that you expect to keep : Love and Attitude (24)

So a first approach would be to consider the regex (?-s)>.+<, which matches any standard char between the > and < boundaries !

But, actually, there are two ranges of characters which verify that rule :

>Love and Attitude (24)<
>Love and Attitude (24)</a><

Right away, we realize, or course, that we must use the shortest range, which leads to the use of lazy quantifiers in regexes, adding a ? symbol to the present + quantifier

So, we’re looking for the shortest non-null range of characters, between the > and < boundaries and, as we’ll have to rewrite this range in replacement, this leads to the regex (?-s)>(.+?)<, with the parentheses to store it for future replacement

IMPORTANT :

Beware about these two descriptions :

If I write the regex (?-is).+?a, it will match any non-null range of characters, different from an a letter, till the first found lower-case letter a
If I write the regex (?-is).+?abc, it will match any non-null range of characters, which does not contain any string abc, till the first found string abc

Just test the two syntaxes, with the greedy + and the lazy +? quantifiers, against the text below :

0123456789 a 0012345678900123 ab 45678900123456789 abc 001234567890012 a 3456789001234 ab 56789001234567 abc 89001234567890

Now, Vasile, as all the line contents must be replaced with the group 1, we have to grab all characters, which are, either, before and after the >Love and Attitude (24)< string. As any non-null gap of characters, in a single line, can be obtained with the regex (?-s).+, then, the final regex is :

(?-s)^.+>(.+?)<.+

If we use the free-spacing mode, where any non-escaped space character is not taken in account, we get this more clear syntax :

(?x-s)    ^    .+    >    (.+?)    <    .+    #    Line Beginning    Some chars    >    The REPLACED string    <    Some chars

Of course, the replacement zone is \1 and the line-break remains unchanged, as not processed

Cheers,

guy038

P.S. :

Naturally, I immediately remembered of the Base64 Decode built-in option, from Don’s plugin ;-))

Vasile Caraus

@guy038 said:

(?-s)>^.+>(.+?)<.+

hello Guy, I try your regex, but it does not work

See a print screen:

https://snag.gy/sVB89v.jpg

Vasile Caraus

Guy, I change a little bit your regex:

SEARCH: .+>(.+?)<.+ OR (?-s)^.+>(.+?)<.+
Replace by: \1

Seems to work this.

guy038

Hi, @vasile-caraus, and All,

Sorry, I didn’t re-read enough my post after replying :-(( Indeed, you’re right : the > character, located right after the (?-s), is useless, of course !

So, I updated my last post, too and deleted this wrong > char, in the regex ;-))

Cheers

guy038

Scott Sumner

@guy038,

Perhaps it would be best if you shared your email address with @Vasile-Caraus for his future regex solution needs (of which there will probably be many). That way the rest of the Community doesn’t have to be bothered with the noise of trivial regex questions that don’t relate to Notepad++ and are better asked elsewhere.

I would also encourage you to set up some sort of “donation” method so that your consulting time can be paid for. Clearly all contributors here enjoy helping people learn on a free basis, but for those recipients that refuse to learn, infinite solutions provided without payment seems ridiculous.

Vasile Caraus

it is not about me, Scott Sumner. It’s about hundreds of people searching for this information on the internet and they will find it here. Also, this will make notepad++ more popular in the world. You should learn to see things globally !

guy038 is a great man, helped me every time. Thank you very much. Next time I will send him email. Happy?

guy038

Hello, @Scott-sumner, @vasile-caraus, and All,

In your last post, Scott, you said :

That way the rest of the Community doesn’t have to be bothered with the noise of trivial regex questions that don’t relate to Notepad++ and are better asked elsewhere

I think you are particularly right on this point. But now I feel a little guilty !! So I’m wondering :

By answering, for the most part, to questions, related to regular expressions, ( for years ! ), am I distorting the true purpose of this forum, which should remain, I agree, focused on the features, improvements and bugs of Notepad++ ?

May be, I should slow down my appetite of regexes ;-)) What’s your feeling, fellows ?

Cheers,

guy038

dinkumoil

@guy038 said:

What’s your feeling, fellows ?

Radio Yerevan would say: It depends ;-).

Notorious leechers shouldn’t be feeded for ever. I don’t know if @Vasile-Caraus actually is such a guy but his “I’m a designer and not a programmer” statement is a lame excuse.

Scott Sumner

So I think that first time regex question posters come here because they don’t know what else to do. They don’t (necessarily) know even what regular expressions are–they just know that they have a problem with data manipulation and don’t know how to solve it…but they hope Notepad++ can help them…AND IT CAN! So, yes, true, this is not a forum for regex, but these naive first-time posters certainly don’t know that; they can be excused.

And I think it is fine to help them out, and give them a boost (pun fully intended) into learning about the Wonderful World of Regular Expressions. But I also think it is fair, if the same people keep coming back for more answers, over and over, that we ask them what they’ve tried first, to show their learning or at least how they are thinking about a problem…instead of just handing out answers.

So, no, @guy038, I don’t think you should slow down with the regexes. You have to be you. :-)