Regex: Double your words

MAPJe71

You might wanna use …
Search: (?m-s)^([^\r\n]+(?:\r?\n|\r))
Replace: \1\1
… to double the lines.

(?m-s) means ^ and $ match at line breaks, a dot does not.
^ start at the beginning of a line.
( marks start of the numbered capture group.
[^\r\n]+ skip characters not a line break, using + instead of * makes sure empty lines are skipped.
(?:\r?\n|\r) match a line break (Windows, Unix or Mac), no need to capture (i.e. ?:) as already part of numbered capture group.
) marks end of the numbered capture group.

\1\1 doubles the numbered capture group and includes line breaks.

guy038

Hi Vasile Caraus and MAPJe71

MAPJe71, although your regex works perfectly well, I would like to propose a shorter regex :

SEARCH = (?-s).+\R

REPLACE = $0$0

Notes :

The in-line modifier (?-s), at beginning of the regex, ensures that the dot meta-character will always match a standard character only, even if you previously checked the . matches newline option, in the Find/Replace dialog
The last part .+\R represents all the characters of a NON-empty line AND its End Of Line character(s), whatever they are ( \r\n in Windows files, \n in Unix files or \r in Old Mac files )
In the replacement part, the syntax $0 stands for the totality of the match, that is to say all the contents of each line WITH its End Of Line character(s). This syntax is just written twice as the goal is to to repeat each line, once ! If you would “triple” each line, just use the syntax $0$0$0, in the replacement part :-)

Cheers,

guy038

P.S. :

I must admit that my regex is not so accurate as MAPJe71’s one !

Indeed, if the cursor is located in the middle of a line, only the part, from the cursor location to the end of a line, will be doubled … or tripled :-(

Scott Sumner

In MAPJe71’s posting, he uses (?m-s) in his regex. Isn’t that equivalent to the simpler (?-s) ?

With ?m I notice that I get different results in these 2 cases, and it doesn’t seem like I should (in both cases I start with editing caret in column 1 of line 1 of the file):

Case 1:
Find what: (?m)(.*\R){26}\K.*\R
. matches newline UNchecked
result: selects line 27 of file

Case 2:
Find what: same as Case 1
. matches newline CHECKED
result: selects text of entire file

MAPJe71

(?m-s) is equal to (?-s) when the source code has the m-flag enabled by default.

With . matches newline checked the source code adds the s-flag which results in the equivalent search expression (?ms)(.*\R){26}\K.*\R (note the missing hyphen between m and s). .*\R is greedy and matches the complete file, \K discards that match but the second .*\R again matches the complete file.

For case 2 (. matches newline checked) change both .* in the expression to non-greedy i.e. (?m)(.*?\R){26}\K.*?\R to get the same result as case 1 (line 27 selected).

guy038

Hi, Scott, MAPJe71 and All,

I would like to point out that the n-line modifier ( either (?m) or (?-m) ) are quite useless, if your regex does NOT have any ^ and $ anchors ! Indeed :

The syntax (?m) ( = Multi-Lines ) means that :
- The ^ anchor represents the location just before the first standard character of any line, included the first line
- The $ anchor represents the location just after the last standard character of any line, included the last line
The syntax (?-m) ( = No Multi- lines => Mono -line ) means that :
- The ^ anchor represents the location just before the very first standard character of the first line, only
- The $ anchor represents the location just after the very last standard character of the last line, only

Note :

Notepad++'s regex engine considers that the (?m) modifier is set, by default. So, just use the (?-m) modifier, if necessary !

Remark :

Let’s suppose that the . matches newline option is UNchecked, and that your moved back to the very beginning of your file. Then :

The regex (?-m)^.+$ means that you’re trying to search for any non-empty range of standard characters, between the very first character and the very last character of the current file. As the dot stands for standard characters, only, no text can be anchored, at the same time, to these two boundaries ! The sole case, where the regex (?-m)^.*$ may match something is when your file contains just ONE line, with NO end of line character, at its end !

On the contrary :

The regex (?s-m)^.+$ means that you’re trying to search for any non-empty range of any kind of characters between the very first character and the very last character of the current file. So, even if your file contains several lines an/or if the last line is followed by EOL character(s), this regex will always catch all the contents of the file, as the CTRL + A shortcut would do :-)
The regex (?-m)^.+ matches the contents of the first line of the current file, only
The regex (?-m).+$ matches the contents of the last line of the current file, only, if this line does NOT end with an EOL character

Note that the similar regexes (?-m)^.* and (?-m).*$ products strange additional results !?

So, Scott, when you speak about the regex (?m)(.*\R){26}\K.*\R, as this regex does NOT contain any ^ nor $ anchor, it just as if you would speak about the regex

(.*\R){26}\K.*\R

Now, if you have . matches newline option UNchecked ( Case 1 ), the above regex, correctly, find the contents of the 27th line, as any sub-regex .*/R represents the contents of ONE line, only !

On the contrary, if the . matches newline option is CHECKED ( Case 2 ), the simple regex .*\R represents all the contents of the current file, till the last EOL character. So, after a long enough backtracking action, to get 26 complete lines , in order to match the regex (.*\R){26}, the regex engine, as MAPje71 said, forgets this match, due to the \K form. Therefore, the cursor location is, still, located at the very first character of the file. Finally, the last part .*\R catches, again, all the contents of the file, till the last EOL character of the file

Again, you’ll notice that these two behaviours, are not related, at all, with the (?m) or (?-m) modifiers !

Best Regards,

guy038

Vasile Caraus

If I want to replace one single line in many files, the only problem is that (?m)(.*\R){26}\K.*\R will find multiple of 26, like 52, 78…etc and will replace all. And not unique line 26.

guy038

Hello Vasile Caraus,

Yes, you’re right about that issue. It’s just because the N++ Boost regex engine, does NOT handle backward assertions properly ! It’s the case, for instance, for the syntaxes \A, \b, \B, as well as some lookbehinds syntaxes :-((

You can have a best implementation of the Boost regex library, in Notepad, downloading the Francois-R Boyer version, from the link, below. Just have a look to the final part of my post, with the explanations about the François-R Boyer’s SciLexer.dll

https://notepad-plus-plus.org/community/topic/9703/is-it-planned-to-switch-to-pcre2/10

I, also, explained all the technical advantages of the François-R Boyer version.

Unfortunately, that excellent implementation :

Is still based on Scintilla v2.2.7 version
Does NOT work on N++ versions, posterior to v6.9

I know : It’s a pity :-((

So, we need a work-around to select and replace, ONLY, the 27th line of any file, concerned by the the S/R. So we going to cheat a bit ! To prevent the regex engine for searching something else ( lines 52, 78… ), we, simply, need a regex that :

Changes all the contents of the line 27 by the replacement regex
Re-writes all the contents of the file, from line 28 to the very end, after the replacement regex of line 27

By this means, the cursor will be, automatically, located at the end of each file, after replacement !

So, this leads to the following S/R :

SEARCH : (?-s)(?:.*\R){26}\K.*(?s)(.*)

REPLACE : String replacing line 27\1

Notes :

Preferably, refer to the link, below, to get the updated notes, about that S/R ( Many thanks to Glen for his valuable re-read !)

https://notepad-plus-plus.org/community/topic/12341/regex-double-your-words/13

The in-line modifier (?-s) forces the regex engine to consider that the dot meta-character will match standard characters, only
The non-capturing group, repeated 26 times, (?:.*\R){26} selects the first 26th complete lines, of each file
The \K syntax suppresses that selection and reset the regex position of search between the last character of line 26 and the first character of line 27
Then the .* part, look for all the standard characters, even 0, of the 27th line, which have to be replaced
The in-line modifier (?s), now, forces the regex engine to consider that the dot meta-character will match any character, standard OR End Of Line character
So, the final part, (.*), inside round parentheses, define the group 1, containing all the text from the first character of line 28 to the very end of the file
In replacement, the expression String replacing line 27 represents, the new replacement text and the \1 is the text from line 28 to the end of the file, which must be re-written without any change !

Et voilà !

Vasile Caraus, this S/R should work on multiple files in the find in Files dialog :-))

Best Regards,

guy038

IMPORTANT :

If you perform this S/R on a few files, using the Replace dialog ( CTRL + H ), just remember these two rules :

Firstly, go to the very beginning of the current file ( CTRL + Origin )
Secondly use, exclusively, the Replace All button ( Due to the \K syntax, the step by step replacement, with the Replace button, does NOT work ! )

Vasile Caraus

SUPER ! WORKS !

thank you guy038 !

Vasile Caraus

And If I want to double all my text

Search
^(?s)((^.*)()|(.*$))
Replace By
$1$2 or if need to insert 2 line between basic text and the double $1\r\r$2

glennfromiowa

Hello guy038,

First, let me say that I have learned more reading a couple of your posts here than I have learned in months of scouring N++ Help and other tutorials and experimenting with regex combinations as I’ve tried to move to the next level of understanding regex expressions beyond the basic ones. It’s always so frustrating when you try something that seems like it should work according to the documentation, but it doesn’t. And it’s hard to find tutorials that use real-world examples that I could modify for the problems I’m trying to solve.

But as I’m trying to understand your post above where you speak about the N++ Boost regex engine, in the Notes : section, you mention:

So, the final part, (.*), inside round parentheses, define the group 1, containing all the text from the first character of line 28 to the very end of the file

In replacement, the expression String replacing line 27 represents, the new replacement text and the \1 is the text from line 28 to the end of the file, which must be re-written without any change !

Technically, wouldn’t that be more accurately stated like this?

…containing all the text from the last character of line 27 preceding the line break to the very end of the file

and

… and the \1 is the text from the line break at the end of line 27 to the end of the file, which must be re-written without any change !

I’m not trying to point out errors; just trying to make sure I’m understanding it. Because I understand in the section earlier where you say this:

The \K syntax suppresses that selection and reset the regex position of search between the last character of line 26 and the first character of line 27

It makes sense then that the position of the regex cursor is actually after the line break, between the two lines. However, since the .* part does not include an EOL pattern \R, the regex cursor after that part should be at the end of line 27, and not pick up the EOL at the end of line 27 until after the (?s) modifier, correct?

Thank you for your helpful posts!

Glenn

guy038

Hello Glenn,

Oh yes, Glenn you’re thousand times right ! I should have re-read my reply, before posting ! Sorry, for being approximative !

So, given the S/R, below :

SEARCH : (?-s)(?:.*\R){26}\K.*(?s)(.*)

REPLACE : String replacing line 27\1

Updated Notes :

The in-line modifier (?-s) forces the regex engine to consider that the dot meta-character will match standard characters, only
The non-capturing group, repeated 26 times, (?:.*\R){26} selects the first 26th complete lines, with their EOL characters, of each file
The \K syntax suppresses that selection and reset the regex position of search between the last EOL character of line 26 and the first character of line 27
Then the .* part, look for all the standard characters, even 0, of the 27th line, which have to be replaced
Then, the in-line modifier (?s), now, forces the regex engine to consider that the dot meta-character will match any character ( standard OR End Of Line character )
So, the final part, (.*), inside round parentheses, defines the group 1, containing all the text, from the EOL characters of line 27 … to the very end of the file
In replacement, the expression String replacing line 27 represents, the new replacement text and the \1 syntax represents the text from the EOL characters of line 27 … to the very end of the file, which must be re-written without any change !

With that work-around, glennfromiowa, I don’t need to write \r\n, in the replacement regex, for the EOL characters of the line 27 !!

Many thanks again, for pointing out these approximations ! Indeed, from now on, I’ll have to be careful, because some people, like you, read my posts, word after word :-))

Best Regards,

guy038

P.S. :

As I’m a moderator of the Node BB Notepad++ forum, I’m going to update my previous post, about that topic , by adding a link to that present post ! So, anyone will, easily, see the differences :-)