Feature Request - Textarea search/replace

Jorge Campos

Hi @Claudia-Frank ,
Thanks for your input.
Here is some images (plus some explanation) of what I’m talking about.

Imagine that you have a xml file with a listing of books with author and title tags. Imagine that this file is HUGE and you was given a task to change some tags. Lets say that the xml is in this format:
<book id=“bk101”>
<author>Gambardella, Matthew</author>
<title>XML Developer’s Guide</title>
</book>

And you need it to be like this:

<book_tag id=“bk101”>
<author_name>Gambardella, Matthew</author_name>
<book_title>XML Developer’s Guide</book_title>
</book_tag>

So, with my proposal and using the textarea replacement (with regex option checked) would be like:
Image two

As you can see in this first image I’ve just added the sample tags with proper regex “(.+)” in place and filled the “Replace With” textarea with the desired output.

After clicking once in the Replace button the file will be like this:
Image two

And after clicking on the Replace All button it would be:
Image One

In Npp to do this task I would have to create a really messy regex considering line breaks, also tabs/spaces before the tags, identify each tag as a group, also the text inside the tags as groups and the replacement field would be really huge with all those values.

Hope it is better now to understand and hope to see this feature soon.
Cheers

Claudia Frank

@Jorge-Campos

I understand and I do see the benefit of having such a functionality,
especially for those who start discovering regular expressions - makes it a lot easier and faster to create.

Recently Maurizio D. posted python script and TK based gui solution for some similar solution.
See here for more information.

Didn’t check his script in detailed but I assume it could be done with a couple of minor tweaks.

Cheers
Claudia

guy038

Hello, @Jorge-campos,

Although, I must admit that the Text Area, of EditPlus, is an interesting feature, easy to use, I think that with our present Notepad++ and regex engine, based on the Boost C++ Regex library, v1.55.0, your kind or S/R is not too complicated to build, with regexes !

Indeed, here is ONE possible solution :

Go back to the very beginning of your document ( CTRL + Origin )
Open the Replace dialog ( Ctrl + H )
Type, in the two zones, Find what: and Replace with:, the regexes :

SEARCH (?-i)</?\K(?:(book)|(author)|(title))

REPLACE (?1\1_tag)(?2\2_name)(?3book_\3)

Select the Regular expression search mode
Click on the Replace All button

Et voilà !

Notes :

First, the modifier (?-i) forces a non-insensitive way of searching
Then, with the part </?, we looks for the < symbol, possibly followed by the / symbol
Now, the \K syntax resets the regex engine and the search location
Then it meets a non-capturing group (?:...|....|.....), necessary because of the lowest priority of the alternative operator
Finally, the regex engine searches, either, the word book, author or title, stored, accordingly, in group 1, 2 and 3
In replacement, we’re using the conditional replacement syntax, of general form (?#Then part:Else part) :
- If the word book has been found, the regex (?1\1_tag) is used => Group 1 is rewritten, followed by the string _tag
- If the word author has been found, the regex (?2\2_name) is used => Group 2 is rewritten, followed by the string _name
- If the word title has been found, the regex (?3book_\3) is used => the string book_ is rewritten, followed by Group 3

To get acquainted to regular expressions concept and syntax, begin with that article, in N++ Wiki :

http://docs.notepad-plus-plus.org/index.php/Regular_Expressions

In addition, you’ll find good documentation, about the new Boost C++ Regex library, v1.55.0 ( similar to the PERL Regular Common Expressions, v1.48.0 ), used by Notepad++, since its 6.0 version, at the TWO addresses below :

http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html

The FIRST link explains the syntax, of regular expressions, in the SEARCH part
The SECOND link explains the syntax, of regular expressions, in the REPLACEMENT part

You may, also, look for valuable informations, on the sites, below :

http://www.regular-expressions.info

http://www.rexegg.com

http://perldoc.perl.org/perlre.html

Be aware that, as any documentation, it may contain some errors ! Anyway, if you detected one, that’s good news : you’re improving ;-))

Best Regards,

guy038

Jorge Campos

Hi @Claudia-Frank

Thanks for your suggestion I did have seen the Maurizio’s solution when I was looking for a plugin that works like the textarea replace function, but I really think that this kind of feature should be part of the native replacement screen on Npp. I will give it a try though.

Hi @guy038
Thanks for your suggestion. I’m very well versed in regular expressions even had given couses on it (I have 25+ years in development experience). When I said that one would need to create a “really messy regex” it was intended to really messy situations in the real world, I may have over simplified my example on how it should work, I agree that for that sample I gave the solution is pretty simple and straight forward IF one understand regex (and the different engines that works with it) as we do, you see, capturing groups, back references and especially the \K with numbered groups which is a boost improvement that causes conflict with other engines is the most problematic and difficult to people understand (StackOverflow’s most question cases)

In my current job position I’m always required to help other developers to create some regex to do replacement tasks on such scenarios (of course they are far more problematic scenarios than the one I gave). I’m always trying to make then to learn about regex and the benefits of it, but they just really don’t care much as long as I’m around to help ¬¬. That’s all because the internet created this line (around 1997):

Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.

So nowadays developers run from regex learning as the devil runs from the cross OR there are not enough experienced people to teach it.

And most of their problems (my colleagues) would be solved easily with such functionality as you said yourself “I must admit that the Text Area, of EditPlus, is an interesting feature, easy to use” Yes, it really is and saves so much time. People here don’t wan’t to spend money to buy EditPlus licences ¬¬ (I have bought my one years ago, started using it before Npp even exists around 99).

So having this feature on this great tool that you guys created and shared for free would be really nice and would make also people that are not familiar with programming use it. I would be an advocate of it, damn I already am :)

Cheers

Alan Kilborn

@Jorge-Campos

I guess that (unlike @Claudia-Frank and perhaps @guy038) I am unclear on what this capability really provides–it appears to me that it simply allows one to ignore whitespace and linebreaks in the text to be found, and to map what IS found (ws and linebreak structure, and regex replacements obviously) into the replace text? See? The weirdness of my last sentence really expresses how confused I am by this.

However, I’m always seeking a better way to do things, and when people talk about replacements using regex, I’m definitely on-board! So I WANT to understand. Perhaps I will try out Editplus to try and get a real feel for it.

As far as getting substantative changes into native Notepad++, I wouldn’t count on it (it could always be a plugin/script solution…no shame in that…), but I wouldn’t want to discourage you too much for your feature request.

Claudia Frank

Hi Alan,

I didn’t try EditPlus but from the screenshot given I assume
it treats every char as given, even whitespace and eol and
escapes chars if needed but the most benefit I see is that it looks more natural.
If you would have to create a single line regex from the example given,
it would already start getting unreadable.

Already thinking how I can include this into my regex tester script. :-)

Cheers
Claudia

Jorge Campos

Hi @Alan-Kilborn,

@Claudia-Frank explained it all (thanks for that Claudia), the main feature on this functionality is “it treats every char as given, even whitespace and eol and escapes chars if needed”.

You guys should really try out this EditPlus feature, you can download the evaluation version and test it. Perhaps without a real world scenario for a guy fully familiarized with regex it is hard to imagine the benefits (as we can do it with regex). The main point is that even with knowledge in regex I still lost some time figuring it out, with this feature it would be much simpler, just copy a portion of the needed text and change very few things on it.

It is important to note that it also works without regex so if in a plain text you wan’t to search a word and replace it with a period then an enter, then a tab then some text and another enter, you just put the word on the search box and the paragraph as is (as you want it) in the replace box without need to know regex or even extended characters like \n, \t, \r, etc.

Cheers

Scott Sumner

@Jorge-Campos said:

It is important to note that it also works without regex so if in a plain text you wan’t to search a word and replace it with a period then an enter, then a tab then some text and another enter, you just put the word on the search box and the paragraph as is (as you want it) in the replace box without need to know regex or even extended characters like \n, \t, \r, etc.

That part of the discussion (especially) makes me want to recommend the techniques discussed in this thread (https://notepad-plus-plus.org/community/topic/12973/multiple-line-replacing-doesn-t-work) as a compromise – this is functionality you can have TODAY, albeit not as nice as the aforementioned user interface would be…

Jorge Campos

Hi @Scott-Sumner

As I said before, I do know how to do it with Npp. I just would like it to be easier as it is in EditPlus. :)

Alan Kilborn

@Jorge-Campos

Now that I have the true understanding, I think that @Claudia-Frank 's suggestion of modifying slightly the Python TK-user-interface script she pointed to is a really good one for this.

Or…you can feel really good about having made your feature request…and get annoyed in the coming months/years that your “really great idea” hasn’t “gone native” in Notepad++…especially when you need to do a replace operation that would really benefit from having it. :-D

Cheers!

Jorge Campos

@Alan-Kilborn

No problem, when I have the time I will try to create it myself. I suppose that I was expecting too much. Thanks for the heads up though.

Cheers

Ion Saliu

@Jorge Campos:
Axiomatic one, I made a similar request in the ‘Help’ forum:

https://notepad-plus-plus.org/community/topic/13808/search-replace-of-multiline-text-blocs
“Search/Replace of Multiline Text Blocs”

I made a reference to another great freebie:

‘Replace Text’ by Ecobyte. I use it to replace text in hundreds of Web pages at once. It has a special feature: ‘Advanced Edit’. I copy-and-paste blocs of text into ‘Advanced Edit’. It converts the bloc into a one-line string with special characters between the lines in the original bloc to be searched/replaced. No complicated regex coding needed.

If not present, Notepad++ could add such feature to its arsenal. I thought the ‘Extended (\n, r, …)’ search option would do it. It doesn’t work for me! It would be great to perform such a task in Notepad++, instead of exiting and running Ecobyte.

Ion Saliu
“A good man is an axiomatic man; an axiomatic man is a happy man. Be axiomatic!”

Scott Sumner

So I discovered, while experimenting with “multiline find” in relation to the original topic of this thread, that it is impossible to successfully “find-all-in-current-document” when your multiline find text includes a literal \ followed immediately by a literal n. Note that there are other similar scenarios with the same problem, but to keep this simple I’ll just use this case for discussion.

Here’s a test case:

print "hello" + "\n"
print "there" + "\n"
print "hello" + "\n"
print "there" + "\n"

Having the above text in your active editor tab, selecting the first two lines and invoking the find dialog will result in the find-what box appearing like this:

print "hello" + "\n"print "there" + "\n"

Note the unseen line-ending is between the " and the p.

At this point, setting normal search-mode and pressing the button for “Find All in Current Docuement” will result in zero hits even though it should result in two. Since no matches are found the Find dialog does NOT automatically close, and at this point it may be noticed that the search-mode has changed from Normal to Extended.

This behavior bothered me, so I dug into the source code to see what was going on. A key part seemed to be extended search mode getting set so I started there. I went to FindReplaceDlg.rc and found that the control ID for that button is IDEXTENDED. Searching the remainder of the source for that text, there were few hits (< 10) and the most interesting appeared to be this one: ::SendDlgItemMessage(_hSelf, IDEXTENDED, BM_SETCHECK, TRUE, 0);–the code itself, not the user, is setting the Extended search mode radiobutton.

The function (void FindReplaceDlg::combo2ExtendedMode(int comboID)) that calls that executes the above SendDlgItemMessage() function is interesting in that it examines the find-what box contents and, if that box contains one or more real line-endings (meaning invocation of the find with multiline selection active), it actually ALTERS the contents, substituting literal \, r, \ and n for any line-ending characters it encounters (note: Windows line endings discussed here; same effect for other types).

Ok, so the find-what contents have been altered to contain some literal text with backslashes and r and n in it, and extended search mode has been set, and THEN Notepad++ runs the find-all. Presumably this is done so that line-endings are easier for the code to search for (??), although I don’t see why leaving them as-is and searching literally for the data wouldn’t work fine (as it does for the find-next functionality, where this odd substitution does NOT happen). Most importantly, the code doing this screws up a user intention to search for certain literal backslash sequences, because now these sequences are interpreted specially, not literally!

A search on combo2ExtendedMode() shows that it is called when find-all-in-current-doc, find-all-in-opened-files, or find-in-files is executed, so any of these will cause the same effect.

So the bottom line is that if you want to select a multiline block of text which contains the literal character sequence equivalents of anything that the Extended search mode thinks of as “escape sequences”, you can’t realistically do those type of searches. Sure, you could painfully edit the find-what box after you select your multiline text and invoke the find dialog, but that rather kills the utility of selecting a big multiline block of data before bringing up the dialog.

I guess I wouldn’t call this behavior a bug per se, I just don’t particularly like the way it works. :(

guy038

Hello, @scott-sumner and All,

Scott, I investigated this “pseudo” bug and here are my ( non exhaustive ) conclusions !

For an easy reading, click on the ¶ button, of the Tools bar, to visualize all the hidden characters

First, these weird results exist from a very long time ( Probably from adding the Extented search mode ? ) I was able to reproduce it, with the old v5.9.8 version !
Case A : IF you select any text, WITHOUT including any following End of line character(s) ( No CR nor LF chars ), any immediate search, in NORMAL search mode, clicking on the Find All in Current document OR on the Find All in All Opened Documents, work normally ! No problem :-))
Case B : IF you select any text, INCLUDING its End of Line character(s), any immediate search, in NORMAL search mode, clicking on the Find All in Current document OR on the Find All in All Opened Documents, does NOT work properly, if the search includes any of the literal following characters, below, and produces NO result :-((
- \t
- \n
- \r
- \0
- \\
Case C : IF you select any text, INCLUDING its End of Line character(s), which contains escaped characters, except for the five above, any immediate search, in NORMAL search mode, clicking on the Find All in Current document OR on the Find All in All Opened Documents, DOES find result(s), if any !
However, note that a bad side-effect of Case B OR Case C searches, is that the search mode is, wrongly, changed from Normal to Extended

So, if you, really, need to search for text, containing any of the five literal forms, below, and ending by its End of Line characters :

\t
\n
\r
\0
\\

Just follow the few steps, below :

Open the Find dialog ( Ctrl + F )
Type the corresponding regexes, below, in the Find what: box
- \\t\R
- \\n\R
- \\r\R
- \\0\R
- \\\\\R OR \\{2}\R
Check the Regular expression search mode
Click on the Find All in Current document OR on the Find All in All Opened Documents button

You may, as well :

Select, first, the text, including its End of Line characters
Open the Find dialog ( Ctrl + F )
In the Find what: field, simply escape, with an other \, inserted before, each anti-slash character ( \ )
Check the Regular expression search mode
Click on the Find All in Current document OR on the Find All in All Opened Documents button

Best Regards,

guy038