How can I change all the words in a given structure?

Alan Kilborn

@darkenb said in How can I change all the words in a given structure?:

I don’t understand why something trivial?

Because you can just reapply the technique used earlier, with different values.
In fact, I think it is just as simple as swapping your FR and RR values.

guy038

Hello, @darkenb, @alan-kilborn, @peterjones, @ekopalypse and All,

I supposed, that, with the Alan’s and Peter’s explanations, you succeeded to achieve what you want !

However, in all that story, there is still something unclear !

@darkenb, I do understand that you don’t want the first line, beginning with a # character, to be translated by Google Translate and that the way you’ve found, to avoid translating, is to add a underscore characters ( _ ) between words ! But, please, could you be a bit more accurate ?

Before the replacement process, is your text as like the B1, B2, B3 or B4 type ?

B1 #. "This is an [ABC DEF GHI JKL] example of text: 0"
B2 #. "This is an [ABC DEF GHI JKL ] example of text: 0"
B3 #. "This is an [ ABC DEF GHI JKL] example of text: 0"
B4 #. "This is an [ ABC DEF GHI JKL ] example of text: 0"

After the replacement process, do you expect the text A1, A2, A3, A4, or A5 ?

A1 #. "This is an [ABC_DEF_GHI_JKL] example of text: 0"
A2 #. "This is an [_ABC_DEF_GHI_JKL_] example of text: 0"
A3 #. "This is an [_ABC_DEF_GHI_JKL _] example of text: 0"
A4 #. "This is an [_ ABC_DEF_GHI_JKL_] example of text: 0"
A5 #. "This is an [_ ABC_DEF_GHI_JKL _] example of text: 0"

To my mind, the more logical version is :

You, presently, have the B1 configuration
You would like to get the A1 or, may be, the A2 configuration, after the replacement process

Just tell me about it !

As always, once the problem is well defined, the solution is more easy to guess and halfway there ;-))

Best regards,

guy038

darkenb

@guy038
Yes, you got it right. A2 is exactly what I want.

However, when the process is completed, that is, when I complete the translation, I have to do the opposite of this process.

That’s why I need to remove _ underscores. However, there are _ lines in other words on the page. Therefore, only the underscores in the […] brackets should be removed in the same way, without changing the sentence. So in summary, I need to apply the following structure.

Before: B1 #. “This is an [ABC DEF GHI JKL] example of text: 0”

After: A2 #. “This is an [ABC_DEF_GHI_JKL] example of text: 0”

More then: B1 #. “This is an [ABC DEF GHI JKL] example of text: 0”

I’ll be glad if you help. I do not understand much, I would appreciate it if you show me practical.

darkenb

A2, the wrong leading signs are erasing for me. It will be B1 first, then A2, then B1 again.

Alan Kilborn

@darkenb said in How can I change all the words in a given structure?:

It will be B1 first, then A2, then B1 again.

Which really isn’t anything different than described before.
And which already has a successful solution.

I don’t really know what @guy038 would additionally supply.
Probably some over-complicated single-step way(s) to do it, which likely would totally eliminate any possible learning opportunity, for an obvious newbie?

I mean, we understand the newbie thing, but is it that hard to follow a recipe?
Maybe someone needs to create a script that would walk one through the process of building up a regex for the “replace only inside delimiters” scenario? OK, I’ll give that a go, and post back here.

guy038

Hello, @darkenb,

OK ! So, you start with text of style B1

B1 #. "This is an [ABC DEF GHI JKL] example of text: 0"

But, the case A# is still not defined, yet. Indeed, you said :

After: A2 #. “This is an [ABC_DEF_GHI_JKL] example of text: 0”

but, according to my classification, this should be A1 ?

So, sorry to repeat, but are you expecting the style A1 or A2, below ?

A1 #. "This is an [ABC_DEF_GHI_JKL] example of text: 0"
A2 #. "This is an [_ABC_DEF_GHI_JKL_] example of text: 0"

BR

guy038

darkenb

@guy038
My sentence design is B1, but I want to switch to A2 and then B1 again. Although I have written A2 layout, the system puts A1 in order. So it lifts the lines, I don’t understand it.

darkenb

@Alan-Kilborn I understand you, but I am seriously a newbie. (? -i: bs (_ | (?! \ A) \ G) (? s: (?! _ bs)).) *? \ K (? - i :) this may be simple for you, but I’m seriously confused. So I couldn’t delete the underscores and replace them with a space character without breaking the sentence. Simply write the codes and let me apply them.

Alan Kilborn

@darkenb said in How can I change all the words in a given structure?:

this may be simple for you

Actually, it is NOT simple for me, meaning that I couldn’t retype it from nothing and have a chance at getting it correct. But that’s the benefit of a plug and play recipe.

guy038

Hi, @darkenb,

OK, I’ve found out all the regexes needed to cover, both, the changes from B1 to A2 styles and then, from A2 to B1 styles again !

Just allow me an hour about to write an informative reply ( for you ! )

BR

guy038

darkenb

@guy038

Thank you. I’m waiting …

Alan Kilborn

@guy038 said in How can I change all the words in a given structure?:

Just allow me an hour about to write an informative reply ( for you ! )

@darkenb

Thank you. I’m waiting …

LOL, all the information was there, a day and a half ago.
What problem is left to be solved?

guy038

Hello, @darkenb, @alan-kilborn, @peterjones, @ekopalypse and All,

@darkenb, before speaking about your problem, I would like to give you some basic information about generic regexes !

Let’s imagine that you want to change a certain class of characters, surrounded double quotes into the same class but surrounded by other symbols

I could have written this generic regex :

SEARCH "FR"

REPLACE SS$0ES

So,

If you have a lot of digits between double quotes, the Find Regex FR is \d+ and that you want to surround them with square brackets, the Start Separator SS is [ symbol and the End Separator ES is the ] symbol So, you would use the regex S/R below :
- SEARCH "\d+"
- REPLACE [$0]
Now, if you have a lot of uppercase letters between double quotes, the Find Regex FR is [A-Z]+ and you want to surround them with braces, themselves surrounded with the -- string, the Start Separator SS is --{ symbol and the End Separator ES is the }-- symbol and you would use the regex S/R below :
- SEARCH "[A-Z]+"
- REPLACE --{$0}--
Finally, if you have a lot of word letters between double quotes, the Find Regex FR is \w+ and you want to surround them with one space char, themselves surrounded with the simple quotes, the Start Separator SS is '\x20 symbol and the End Separator ES is the \x20' symbol and you would use the regex S/R below :
- SEARCH "\w+"
- REPLACE '\x20$0\x20'

Note that the $0 syntax, in replacement, represents the complete search match and \x20 represents a single space char

So, you can see, that whatever the real example chosen, this generic regex remains exact and means :

After the replacement, any range of characters between double quotes will be changed as the same range, preceded with the SS separator and followed with the ES separator

Of course, this example is very basic and should be wrong in some particular cases but just gives you a general idea ! The goal is to replace the generic names, as FR, SS and ES with their true regex values, regarding your own needs and what you want to achieve ;-))

Let’s go back to your problem ! For all the regexes, provided below, the process is :

Open or switch to your file, in N++
Open the Replace dialog ( Ctrl + H )
- Fill up the Find what: and Replace with: zones with the appropriate regexes
- Un-tick all box options, first
- Tick the Wrap around option ( IMPORTANT : this ensures that current file is scanned from its very beginning to its very end, whatever the current position of the caret )
- Select the Regular expression search mode
- Click on the Replace All button ( Do not use the Replace button, due to the possible \K syntax in regexes )
In addition note that the square brackets are special regex characters with a special meaning and need to be escaped when you want to search them as literals. However, unfortunately, this escape syntax is not properly displayed, on our NodeBB forum. So I’m going to use the usual \x## syntax, where ## represents the hexadecimal code of a character. So, in regexes, I will refer of the [ as the \x5b char and of the ] as the \x5d char !

First, I will provide the method and the different regexes needed. Secondly, I’ll give you some explanations on them. However, I strongly advice you to learn basic regex documentation from here ;-))

A) This first regex S/R will add an underscore right after any [ character and right before any ] character
- SEARCH (\x5b)|\x5d
- REPLACE ?1\x5b_:_\x5d
B) Then, this second regex S/R will change any space char, within square brackets only, with an underscore char :
- SEARCH (?-s)(?:\x5b_|(?!\A)\G)(?:(?!_\x5d).)*?\K\x20
- REPLACE _
Now, just translate all your text with Google Translate

... ... ....
... ... ....
... ... ....

C) Once this translation task over, this third regex S/R will remove the underscore char located after the [ character and before the ] character
- SEARCH (\x5b)_|_\x5d
- REPLACE ?1\x5b:\x5d
D Finally, this fourth regex S/R, below, will change back any underscore character , within square brackets only, with an space char :
- SEARCH (?-s)(?:\x5b|(?!\A)\G)(?:(?!\x5d).)*?\K_
- REPLACE \x20

Et voilà !

Notes :

Regarding the S/R A and C :
- The search part is rather obvious and searches two different expressions, separated with the alternation symbol |
- Note that the \x5b character is surrounded with parentheses and so, defines a group 1 which is re-used in replacement
- The replacement has the syntax ?1(True:False) which means :
  - If group 1 exists ( so when the first alternative \x5b occurs ) rewrite the True part
  - If group 1 does not exist ( so, when the second alternative \x5d occurs ) rewrite the False part
Regarding the S/R B and D :
- They are, both, built up from the generic S/R regex :
  - SEARCH (?s-i:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\K(?s-i:FR)
  - REPLACE RR
- Note that the different syntaxes (?s-i:•••••) are non-capturing groups ( i.e. groups which do not store the contents between parentheses ), which contain the leading modifiers s and i
  - The s modifier means that the dot regex char ( . ) represents any char, even EOL characters
  - The -i modifier means that the search is sensible to case of letters characters
- However, as we do not search any letter and as I suppose that your different zones [•••••••] stand all in a single line, this generic regex can be simplified as below, with a leading -s modifier, meaning that a . will match a single standard character, only
  - SEARCH (?-s)(?:BSR|(?!\A)\G)(?:(?!ESR).)*?\KFR
  - REPLACE RR
- Globally, the generic S/R, above will change any search expression, found with the FR regex, with the replacement expression, expressed with the RR syntax, between the BSR and ESR excluded locations, only !
- So, for the regex S/R B :
  - BSR, Beginning Search-region Regex, is the regex \x5b_
  - ESR, Ending Search-region Regex, is the regex _\x5d
  - FR, Find Regex, is the regex \x20
  - RR, Replacement Regex is the regex _
- And, if we change the names of generic regex with the real regex values, we exactly get the search regex (?-s)(?:\x5b_|(?!\A)\G)(?:(?!_\x5d).)*?\K\x20 and the replacement regex _
- Now, regarding the regex S/R D, note that we already remove the underscores close to the square brackets, with the regex S/R C. So, this time :
  - BSR, Beginning Search-region Regex, is the regex \x5b
  - ESR, Ending Search-region Regex, is the regex \x5d
  - FR, Find Regex, is the regex _
  - RR, Replacement Regex is the regex \x20
- And, again, if we change the names of generic regex with the real regex values, we exactly get the search regex (?-s)(?:\x5b|(?!\A)\G)(?:(?!\x5d).)*?\K_ and the replacement regex \x20 !

Best Regards,

guy038

darkenb

@guy038
I swear you are the king. : D Thanks to you, my job has been solved, and I can do it myself to a certain extent when something happens.

Thank you very, very much…

guy038

Hello, @darkenb, @alan-kilborn, @peterjones, @ekopalypse and All,

@darkenb :

Regarding the B and D regex S/R, note that I didn’t explain fully how they work. I do think that you need to learn basic regex concepts first, before trying to understand these complicated syntaxes which would just confuse you ;-)

To All,

Regarding the A and C regex S/R, they can be simplified and we do not need to use conditional regexes ! Indeed :
- Regex S/R A :
  - SEARCH (\x5b)|(\x5d)
  - REPLACE \1_\2
- Regex S/R C :
  - SEARCH (\x5b)_|_(\x5d)
  - REPLACE \1\2
As you can see, the opening square bracket \x5b is stored as group 1 and the ending square bracket \x5d is stored as group 2. And, as the two alternatives are mutually exclusive, we can write, both, \1 and \2 in the replacement zone ( or $1 and $2 ). We know that when one is defined, the other one is undefined and equivalent to an empty string ;-))

Best Regards,

guy038