Hi, @grimaldas-grydas and All,
To begin with, let’s me explain the general method used. we’re going to use a short line, from your INPUT text, which must be processed :
pppp={ 50350929 168.36935 33589252 }
The goal is to write the three numbers 50350929, 168.36935 and 33589252 , each one on a different line, and prefixed with the string pppp, located before the = sign, in order to get :
pppp=50350929
pppp=168.36935
pppp=33589252
The problem is that when the regex engine catches, successively, each number, it does not know anymore the pppp string, located at the beginning of current line !
So my idea was to swap the list of numbers and the string pppp before the equal sign and separate these two ranges with a temporary char, not present in your data !
So, after a first regex S/R, we get the temporary text, below :
50350929 168.36935 33589252¤pppp
With this new layout, when the regex engine matches a number ( integer / decimal ) it is fairly easy, with a look-head structure, to store, at each time, the string after the temporary ¤ char, ending the current line !
Then, with a second regex S/R, we finally get our expected text :
pppp=50350929
pppp=168.36935
pppp=33589252
Before we get into the details, it is IMPORTANT to point out that I found out a case where my previous regex S/R did not work ! So, you’ll have to use the second version, below !
The complete regex S/R, where I added the \h* part that you mentioned and where I fixed the bug, is :
SEARCH (?-s)^\h*(\w+)={(.+)\h+}$|(^)?\h+(\d+(?:\.\d+)?)(?=.*¤(\w+))|¤.+
REPLACE (?2\2¤\1)?4(?3:\r\n)\5=\4
can be split into 2 consecutive regex S/R, which are completely independent :
The Search/Replacement A, which creates the intermediate text :
SEARCH (?-s)^\h*(\w+)={(.+)\h+}$
REPLACE ?2\2¤\1
The Search/Replacement B, which gets the expected and final text
SEARCH (?-s)(^)?\h+(\d+(?:\.\d+)?)(?=.*¤(\w+))|¤.+
REPLACE ?4(?3:\r\n)\5=\4
The groups, defined by the A and B search regexes are :
(?x-s) ^ \h* (\w+) = { (.+) \h+ } $
¯¯¯ ¯¯
Gr 1 Gr 2
(?x-s) (^)? \h+ ( \d+(?: \. \d+ )? ) (?= .* ¤ (\w+) ) | ¤ .+
¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯¯¯
Gr 3 Gr 4 Gr 5
Note, that I use the free-spacing mode (?x) for a better readability and each regex contains the (?-s) in-line modifier which means that any regex . char will match a single standard character ( not EOL ones )
In search regex A :
The part ^\h*(\w+)= matches the word string, stored as group 1, after possible leading blank chars, till an = character
The part {(.+)\h+}$ matches a literal { char, then any non-null range of chars, each number preceded with space(s), which is stored as group 2, till space char(s) and a closing } char, ending the current line
In replacement regex A :
?2\2¤\1, which should be
exactly expressed as
(?2\2¤\1), is a
conditional replacement syntax, which means that IF
group 2 exists, it must rewrite the
group 2 first,
\2( i.e. the
numbers only ), then the
literal char
¤ and finally
group 1 ( the string
pppp )
Now, the search regex B contains two alternatives :
The first alternative (?-s)(^)?\h+(\d+(?:\.\d+)?)(?=.*¤(\w+))
The middle part (\d+(?:\.\d+)?) matches any integer or decimal number, which is stored as group 4. Note the optional non-capturing group (?:\.\d+)? in the case of a decimal number
The first part (^)?\h+ matches matches the blank char(s), preceding a number. Remark that, if the leading blank char(s) begins current line, the optional group 3, (^)?, is then defined
The final part (?=.*¤(\w+)), is a look-ahead structure, not included in the final match, but which must be true in order to get an effective match. So current matched number must be followed by a range, possibly null, of characters till the temporary char ¤ and the ending string pppp
The second alternative ¤.+, which is used when current parsing position of the regex engine is at the ¤ location, after the processed numbers. This second alternative, without any group, simply matches the temporary ¤ char and all subsequent chars of current line, and should be deleted in replacement !
In replacement regex B :
?4(?3:\r\n)\5=\4, which should be exactly expressed as (?4(?3:\r\n)\5=\4), means that, IF group4 exists ( the numbers ), it must :
Execute, first, the (?3:\r\n) conditional replacement. This replacement does not include a THEN part and, only, the regex \r\n as an ELSE part, after the : char. So, this means that if group 3 does not exist ( number not at beginning of current line ) , it must insert a leading line-break !
Write the group 5, \5, followed with a literal = sign
Finally, write the group 4 ( current number matched by the first alternative of search regex B )
Note that, when matching the second alternative ¤.+ of the search regex B, at end of current line, group 4 is not defined. So, no action occurs in replacement. Thus, concretely, this means that the string ¤pppp is deleted !
Remarks :
The S/R A and B are independent. As a demonstration :
When executing, first, the search regex A, as no ¤ character already exists, each alternative of the search regex B cannot match
When executing, in a second time, the search regex B, as the intermediate text ( after running A ) does not contain any { nor } characters, obviously, the search regex A cannot match, too !
Thus, we can merge these two successive S/R in one regex S/R only ! You’ll note that :
The redundant part (?-s), at beginning of regex S/R B, is omitted
The replacement of S/R A, ?2\2¤\1, must be enclosed between parentheses, (?2\2¤\1), in order to not include the replacement section of S/R B
As a conclusion, the complete regex S/R, with the free-spacing mode in the search part, is :
SEARCH (?x-s) ^ \h* ( \w+ ) = { ( .+ ) \h+ } $ | (^)? \h+ ( \d+ (?:\.\d+)? ) (?= .* ¤ ( \w+ ) ) | ¤ .+
REPLACE (?2\2¤\1)?4(?3:\r\n)\5=\4
And outputs the expected text, after two consecutive clicks on the Replace All button !
As mentioned in my last post, if we try to click a third time on the Replace All button, luckily, nothing else occurs ! Why ? Easy : as brace { or } characters nor ¤ character exists in our final text, any alternative of the overall regex cannot match. Logical ;-))
I just hope, @grimaldas-grydas, that these explanations help you a bit !
guy038