Find start of line up to specific character, and copy this to end of line.

Dad3353

Fellow Notepad++ Users,

Could you please help me the the following search-and-replace problem I am having?

Creating dictionary text for Godot 4, I need to reformat multiple lines of text. All characters up to, but not including, ’ = ’ should be copied to the end of the line, after ‘GlobalVariables.’, and (optional; I could do this part by hand…) the beginning characters should be enclosed in quotes (ie : gv_fire_rang_mini becomes “gv_fire_rang_mini”). I have several hundred lines like these, and more to come as I continue developing my application.
Thanks in advance; meanwhile… Have a great day.

Here is the data I currently have (“before” data):

gv_fire_rang_mini  = GlobalVariables.
gv_fire_rang_maxi  = GlobalVariables.
gv_fire_accu_mini  = GlobalVariables.
gv_fire_accu_maxi  = GlobalVariables.
gv_fire_dura  = GlobalVariables.
gv_fire_relo  = GlobalVariables.
gv_fire_relo_mini  = GlobalVariables.

Here is how I would like that data to look (“after” data):

"gv_fire_rang_mini"  = GlobalVariables.gv_fire_rang_mini
"gv_fire_rang_maxi"  = GlobalVariables.gv_fire_rang_maxi
"gv_fire_accu_mini" = GlobalVariables.gv_fire_accu_mini
"gv_fire_accu_maxi"  = GlobalVariables.gv_fire_accu_maxi
"gv_fire_dura"  = GlobalVariables.gv_fire_dura
"gv_fire_relo"  = GlobalVariables.gv_fire_relo
"gv_fire_relo_mini"  = GlobalVariables.gv_fire_relo_mini

To accomplish this, I have tried using the following Find/Replace expressions and settings

Find What = ^.*(?==\s)
Replace With =
Search Mode = REGULAR EXPRESSION
Dot Matches Newline = NOT CHECKED

This finds (and deletes…) the start characters, but does not, of course, copy these characters to the end of line. I am too inexperienced with Notepad++ to know if this is even possible. Can it be done…?

Unfortunately, this did not produce the output I desired, and I’m not sure why. Could you please help me understand what went wrong and help me find the solution?

gerdb42

@Dad3353
Your expression matches everything from the beginning of a line up to (but not including) a “=” followed by whitespace. Then the match is replaced with nothing (effectively removed).

You will need to tell the RegEx engine what to put in place of the match. In RegEx this is done with “capturing groups”. You may want to look here for more information: https://community.notepad-plus-plus.org/topic/15765/faq-where-to-find-regular-expressions-regex-documentation

For your need you may try this:

Find what: (?-s)^(\w+)(.*)$
Replace with: "$1"$2$1

(?-s) makes sure that “.” will not match newline, no matter what the checkbox says.
^ anchors the expression to the beginning of a line
(\w+) creates a first capturing group containing all following word characters
(.*) creates a second capturing group containing the remainder of the line
$ anchors the expression to the end of the line

Now for the replace part:
"$1" inserts the content of the first capturing group surrounded by quotation marks
$2 inserts the remainder of the line
$1 inserts the content of the first capturing group again creating your desired copy

hth

Dad3353

@gerdb42 Thank you for this very speedy reply and solution; this worked perfectly, with no need for ‘tweaking’ at all. Your explanation is most helpful (I had looked at ‘groups’, but brain fog set in; I’ve just turned 75…).
One thing I haven’t understood is how the identifying of the first group (up to ‘=’…) works. It can’t count characters, and I don’t see ‘=’ in there as a delimiting character. Is it the ‘space’ that delimits the first group…?
Either way, this ‘tuition’ has opened up yet another avenue of utility for Notepad++, as if it wasn’t useful enough already, for which I’m very grateful…!

guy038

Hello, @dad3353, @gerdb42 and All,

I will take the liberty of answering your question. Actually, as the contents of the first group is \w+ ( i.e. \w\w* ) this means that, necesarily, the group 1 will stop at the last letter i of the gv_fire_rang_mini expression ! Roughly, we can say that \w = [\u\l\d_] so any upper-case letter, any lower-case, any digit or the underscore character. This list may contain some accentuated characters as well !

So, the layout of this regex can be expressed as :

    gv_fire_rang_mini  = GlobalVariables.
   ^<----- Gr 1-----><------ Gr 2 ------>$
   ^       (\w+)             (.*)        $

Thus, the replace operation become obvious :

The first group ( text before the space and the equal sign ), surrrounded with double quotes
The second group ( the remainder of each line )
The first group, again, which is to be repeated at the end of each line

Notes :

Regarding the search expression, you may use the free-spacing mode ( (?x) ) to properly identify each component of the regex :

SEARCH (?x-s) ^ ( \w+ ) ( .* ) $

In this mode, you must enclose any space and # characters between bracket [ ] or use their escaped syntax

Regarding the replace expression, you could had used also two other syntaxes :
- "\1"\2\1
- "${1}"${2}${1}

The second syntax is necessary when, for example, you have this kind of replacement *${1}0123. With your INPUT text, this would mean that the OUTPUT text will be :

gv_fire_rang_maxi0123
gv_fire_accu_mini0123
gv_fire_accu_maxi0123
gv_fire_dura0123
gv_fire_relo0123
gv_fire_relo_mini0123

Best Regards

guy038

Dad3353

@guy038… Thanks for the clarification; I believe that it’s starting to sink in. I’m assuming that, in your explanation, you mean that group 1 will stop at, not the ‘i’ (that’s just the coincidence of the first line…), but any character followed by a space. This makes sense of it all now; I was looking too hard at the’ = ’ as the splitting point. Onward and Upward…! :-)

guy038

Hi, @dad3353, @gerdb42 and All,

BTW, @dad3353, when you don’t see, clearly enough, the scope of each group of a regex, do a test with a small bunch of your INPUT text, using this kind of replacement :

REPLACE >$1<\r\n>$2<\r\n........>$n<\r\n-------------------------\r\n

So, let’s imagine this part of your INPUT text :

gv_fire_accu_maxi  = GlobalVariables.
gv_fire_dura  = GlobalVariables.
gv_fire_relo  = GlobalVariables.
gv_fire_relo_mini  = GlobalVariables.

Then, with the specific search/replace, below :

FIND (?-s)^(\w+)(.*)$

REPLACE >$1<\r\n>$2<\r\n-------------------------\r\n

You would get this OUTPUT result :

>gv_fire_accu_maxi<
>  = GlobalVariables.<
-------------------------

>gv_fire_dura<
>  = GlobalVariables.<
-------------------------

>gv_fire_relo<
>  = GlobalVariables.<
-------------------------

>gv_fire_relo_mini<
>  = GlobalVariables.<
-------------------------

Then, it becomes easy to note the scope of each group, from line to line, which is the contents between its > and the < delimiters !

Of course, you may choose any similar replacement like, for example, |$1|\r\n|$2|\r\n-------------------------\r\n

Best regards,

guy038

gerdb42

@Dad3353

In addition to guy038’s excellent explanation you may want to look at https://regex101.com/. It will show a graphical representation of what the RegEx does as well as some explanation. If you select “Substitution” in the left hand menu under “FUNCTION” you may as well see what the replacement does.

Dad3353

@gerdb42 Thanks. A little tricky, but will be useful once I’ve worked on a couple of simple ‘test’ examples. It’s not exactly like the Npp version, apparently, but will serve as ‘fairy bike wheels’ until I master this stuff a little more. :-)