FEATURE REQUEST: Replace and Preserve case

Scott Sumner

@MacGyver27

How about an example of the kind of replace you are talking about? Some data, before and after…

MacGyver27

If I do the replace with case preserve on string “device” -> “location” it should behave as below:

Device
device
DEVICE
deVICE
dEVICE

replaced to

Location
location
LOCATION
location
lOCATION

Thanks, Michal

Scott Sumner

@MacGyver27

I guess I see the pattern on all of the examples except how “deVICE” translates into “location”.

MacGyver27

@Scott-Sumner that is meant to… because they can be arbitrary length…

it understands:

first letter uppercase
first letter lowercase
all lowercase
all uppercase

Thanks, Michal

MAPJe71

@MacGyver27
So “deVICE” should be replaced with “loCATION” instead of “location” as per your example.

Scott Sumner

@MAPJe71

It’s tough when clarification requests don’t seem to clarify. :-)

MacGyver27

ahh, breaking it down then…

find: device
replace with: location
preserve case: true

first letter uppercase: Device -> Location
first letter lowercase: dEVICE -> lOCATION
all lowercase: device -> location
all uppercase: DEVICE -> LOCATION
everything rest: will be replaced with no case adjustion using the given string so: deVICE -> location

Thanks, Michal

guy038

Hello, MacGyver27,

From your first post, I, already, imagined some regexes that would do the S/R, according to your case rules.

However, reading the recent posts, about that topic, I need to ask you, again, for additional information !

If I fully understand you, in the 16 rows table, below, I recapitulated all possible cases, where I supposed :

A) The replacement of each case forms of the string “word” by the string “lEtTeR”, with that exact spelling
B) The replacement of each case forms of the string “word” by the string “sEt”, with that exact spelling
In the first case, the replacement string is longer than the searched string
In the second case, the replacement string is shorter than the searched string

Seemingly, for the 4 rows A, D, I and P, I, just, repeated, what you would like to ! Am I right about it ?

But, if you don’t mind, I would like you to clarify all the other cases !

As for me, I supposed that, for all rows, different from rows A, D, I and P, the EXACT replacement string are just written, without any change. Again, am I right, about this assumption ?

Once, we’ll be agree about the different case rules to suppose, it shouldn’t be so difficult to get the right regexes !

*-------*-------------------*----------------------*----------------------*
| Type  |  SEARCHED string  |  REPLACEMENT string  |  REPLACEMENT string  |
| Case  |     =  "word"     |     =  "lEtTeR"      |      =  "sEt"        |
*-------*-------------------*----------------------*----------------------*
|   A   |       WORD        |    =>    LETTER      |      =>    SET       | 
*-------*-------------------*----------------------*----------------------*
|   B   |       WORd        |    =>    lEtTeR      |      =>    sEt       |
|   C   |       WOrD        |    =>    lEtTeR      |      =>    sEt       |
*-------*-------------------*----------------------*----------------------*
|   D   |       WOrd        |    =>    Letter      |      =>    Set       |
*-------*-------------------*----------------------*----------------------*
|   E   |       WoRD        |    =>    lEtTeR      |      =>    sEt       |
|   F   |       WoRd        |    =>    lEtTeR      |      =>    sEt       |
|   G   |       WorD        |    =>    lEtTeR      |      =>    sEt       |
|   H   |       Word        |    =>    lEtTeR      |      =>    sEt       |
*-------*-------------------*----------------------*----------------------*
|   I   |       wORD        |    =>    lETTER      |      =>    sET       |
*-------*-------------------*----------------------*----------------------*
|   J   |       wORd        |    =>    lEtTeR      |      =>    sEt       |
|   K   |       wOrD        |    =>    lEtTeR      |      =>    sEt       |
|   L   |       wOrd        |    =>    lEtTeR      |      =>    sEt       |
|   M   |       woRD        |    =>    lEtTeR      |      =>    sEt       |
|   N   |       woRd        |    =>    lEtTeR      |      =>    sEt       |
|   O   |       worD        |    =>    lEtTeR      |      =>    sEt       |
*-------*-------------------*----------------------*----------------------*
|   P   |       word        |    =>    letter      |      =>    set       |
*-------*-------------------*----------------------*----------------------*

See you later !

Best regards

guy038

MacGyver27

@guy038 said:

However, reading the recent posts, about that topic, I need to ask you, again, for additional information !

wow, didn’t expect such a grasp about this… sure I’ll be glad to help

You are as long for the case D the searched string is Word and no WOrd as you have it in the table…

But, if you don’t mind, I would like you to clarify all the other cases !

As for me, I supposed that, for all rows, different from rows A, D, I and P, the EXACT replacement string are just written, without any change. Again, am I right, about this assumption ?

I did verify this and it behaves the way that it tries to apply case from the original string and than for the remaining characters it will use the case of the given string (see the case D, it took the capital case of the first letter and applied it to lEtTeR -> LEtTeR).

Please refer to the following table, I have tested all the use cases in the IntelliJ IDEA.

*-------*-------------------*----------------------*----------------------*
| Type  |  SEARCHED string  |  REPLACEMENT string  |  REPLACEMENT string  |
| Case  |     =  "word"     |     =  "lEtTeR"      |      =  "sEt"        |
*-------*-------------------*----------------------*----------------------*
|   A   |       WORD        |   =>    LEtTeR       |   =>    SEt          |
*-------*-------------------*----------------------*----------------------*
|   B   |       WORd        |   =>    LEtTeR       |   =>    SEt          |
|   C   |       WOrD        |   =>    LEtTeR       |   =>    SEt          |
*-------*-------------------*----------------------*----------------------*
|   D   |       Word        |   =>    LEtTeR       |   =>    SEt          |
*-------*-------------------*----------------------*----------------------*
|   E   |       WoRD        |   =>    LEtTeR       |   =>    SEt          |
|   F   |       WoRd        |   =>    LEtTeR       |   =>    SEt          |
|   G   |       WorD        |   =>    LEtTeR       |   =>    SEt          |
|   H   |       Word        |   =>    LEtTeR       |   =>    SEt          |
*-------*-------------------*----------------------*----------------------*
|   I   |       wORD        |   =>    lEtTeR       |   =>    sEt          |
*-------*-------------------*----------------------*----------------------*
|   J   |       wORd        |   =>    lEtTeR       |   =>    sEt          |
|   K   |       wOrD        |   =>    lEtTeR       |   =>    sEt          |
|   L   |       wOrd        |   =>    lEtTeR       |   =>    sEt          |
|   M   |       woRD        |   =>    lEtTeR       |   =>    sEt          |
|   N   |       woRd        |   =>    lEtTeR       |   =>    sEt          |
|   O   |       worD        |   =>    lEtTeR       |   =>    sEt          |
*-------*-------------------*----------------------*----------------------*
|   P   |       word        |   =>    lEtTeR       |   =>    sEt          |
*-------*-------------------*----------------------*----------------------*

Thanks for the effort. If you come to face new undefined let me know. Michal

guy038

Hi, MacGyver27,

Oh, now I see ! I had imagined a more complicated rule ! Finally, you would like that :

The first character of the replacement string would have the SAME capitalization than the first letter of the searched string
All the other letters, of the replacement string, would stay UNCHANGED

If so, use the general regex search/replacement, below :

SEARCH (?-i)\b(?:(<UPPER case First letter>)|<LOWER case First letter>)(?i)<ALL letters, from the 2ND>\b , for the searched string

REPLACE (?1<UPPER case First letter>:<LOWER case First letter>)<ALL letters, from the 2ND> , for the replacement string

So, for instance :

If searched string = word, whatever its case, and replacement string = lEtTeR, with that exact case, you would obtain the S/R :

SEARCH (?-i)\b(?:(W)|w)(?i)ord\b

REPLACE (?1L:l)EtTeR

If searched string = word, whatever its case, and replacement string = sEt, with that exact case, you would obtain

SEARCH (?-i)\b(?:W|w)(?i)ord\b

REPLACE (?1S:s)Et

A last example, for fun :-)

If searched string = MacGyver27, whatever its case, and replacement string = Notepad++'s FAN !, with that exact case, you would obtain :

SEARCH (?-i)\b(?:(M)|m)(?i)acGyver27\b

REPLACE (?1N:n)otepad++'s FAN !

Here is, below, the result of the replacement, for some forms of your NodeBB name !

MACGYVER27    =>    Notepad++'s FAN !

MacGyver27    =>    Notepad++'s FAN !

macgyver27    =>    notepad++'s FAN !

maCGyvER27    =>    notepad++'s FAN !

mAcGYVEr27    =>    notepad++'s FAN !

MACGyver27    =>    Notepad++'s FAN !

Notes :

Due to the in-line modifier (?-i), the search will always begins, in a NON-insensitive way, whatever you check/uncheck the Match case option, in the replace dialog
Then, the part (?:(<UPPER case First letter>)|<LOWER case First letter>), with the alternative, embedded in a non-capturing group, looks for, either, an UPPER case / LOWER case FIRST letter. Note that if an UPPER case form is found, it’s stored as group 1
The part (?i)<ALL letters, from the 2ND>, matches all the remaining letters, whatever its case, due to the in-line modifier (?i)
Finally, the two \b assertions ensure you that your searched string is a real word, not embedded in a greater expression !
In replacement, the form (?1<UPPER case First letter>:<LOWER case First letter>) is a conditional replacement :
- If group 1 exists, it rewrites the first letter of the replacement string, in UPPER case
- If group 1 does NOT exist, it rewrites the first letter of the replacement string, in LOWER case
Finally, all the remaining letters of the replacement string, are rewritten, with their exact case

Cheers,

guy038

guy038

Hello, MacGyver27 and All

Thinking to your problem, I’ve just imagined a particular search/replacement, that changes the case of each letter, of the replacement word, ACCORDINGLY TO the case of each corresponding letter, in the searched word :-)))

Hypotheses :

The searched and replacement words are supposed made of any word character. That is to say that they may contain possible digit characters and/or any underscore symbol as, for instance, the words TEST_02 or MY_FUNCTION

Three cases are possible :

A) The searched and replacement words have the same size :

For instance, if the searched word is device, whatever its capitalization form, and the exact replacement word is SySTeM, here are, below, the results of this S/R, for some forms of the searched word :

DeVIce   =>   SySTem
dEVicE   =>   sYSteM
devICe   =>   sysTEm
dEvIcE   =>   sYsTeM

B) The searched word have less letters than the replacement word :

For instance, if the searched word is device, whatever its capitalization form, and the exact replacement word is lOCatIOn, here are, below, the results of this S/R, for some forms of the searched word :

DeVIce   =>   LoCAtiOn
dEVicE   =>   lOCatIOn
devICe   =>   locATiOn
dEvIcE   =>   lOcAtIOn

Note, in that example, that the 7th and 8th remaining letters, of the replacement word, which cannot be associated to a corresponding letter, in the searched word, keep their initial capitalization form !

C) The searched word have more letters than the replacement word :

For instance, if the searched word is device, whatever its capitalization form, and the exact replacement word is TeSt, here are, below, the results of this S/R, for some forms of the searched word :

DeVIce   =>   TeST
dEVicE   =>   tESt
devICe   =>   tesT
dEvIcE   =>   tEsT

Remarks :

If a character of the searched word is NOT a letter, the associated character, in the replacement word, will NOT be changed
If a character of the replacement word is NOT a letter, it will NOT be changed, of course !

To perform these capitalization changes, TWO consecutive S/R will be mandatory.

In addition, I need a dummy character, NOT used yet, in your file. I chose the # character, but feel free to chose any other one !

So :

Let M be the number of letters of the SEARCHED word
Let N be the number of letters of the REPLACEMENT word

Then :

For case A) ( M = N ) OR case B) ( M < N )
- 1. FIRST S/R :
  - SEARCH (?i)\b<SEARCHED word, in ANY case>\b
  - REPLACE #<REPLACEMENT word, in ANY case>$0#
- 1. SECOND S/R :
  - SEARCH #|(?-is)(?=\w{<N>}(?:(\u)|(\l)))(\w)(?=\w*#)|(?i)<SEARCHED word, in ANY case>#
  - REPLACE \u(?1\3)\l(?2\3)
For case C) ( M > N )
- 1. FIRST S/R :
    - SEARCH (?i)\b(<The N FIRST letters>)(<The M-N LAST letters>)\b, of the SEARCHED word, in ANY case
    - REPLACE #<REPLACEMENT word, in ANY case>\1#\2
- 1. SECOND S/R :
    - SEARCH #|(?-is)(?=\w{<N>}(?:(\u)|(\l)))(\w)(?=\w*#)|(?i)<The N FIRST letters>#<The M-N LAST letters> , of the SEARCHED word, in ANY case
    - REPLACE \u(?1\3)\l(?2\3)

Then, from the examples, above :

For case A), from the original text :

DeVIce
dEVicE
devICe
dEvIcE

SEARCH (?i)\bdevice\b

REPLACE #system$0#

We obtain, after the 1ST S/R :

#systemDeVIce#
#systemdEVicE#
#systemdevICe#
#systemdEvIcE#

SEARCH #|(?-is)(?=\w{6}(?:(\u)|(\l)))(\w)(?=\w*#)|(?i)device#

REPLACE \u(?1\3)\l(?2\3)

And, finally, after the 2ND S/R :

SySTem
sYSteM
sysTEm
sYsTeM

For case B), from the original text :

DeVIce
dEVicE
devICe
dEvIcE

SEARCH (?i)\bdevice\b

REPLACE #location$0#

We obtain, after the 1ST S/R :

#locationDeVIce#
#locationdEVicE#
#locationdevICe#
#locationdEvIcE#

SEARCH #|(?-is)(?=\w{8}(?:(\u)|(\l)))(\w)(?=\w*#)|(?i)device#

REPLACE \u(?1\3)\l(?2\3)

And, finally, after the 2ND S/R :

LoCAtion
lOCatIon
locATion
lOcAtIon

For case C), from the original text :

DeVIce
dEVicE
devICe
dEvIcE

SEARCH (?i)\b(devi)(ce)\b

REPLACE #test\1#\2

We obtain, after the 1ST S/R :

#testDeVI#ce
#testdEVi#cE
#testdevI#Ce
#testdEvI#cE

SEARCH #|(?-is)(?=\w{4}(?:(\u)|(\l)))(\w)(?=\w*#)|(?i)devi#ce

REPLACE \u(?1\3)\l(?2\3)

And, finally, after the 2ND S/R :

TeST
tESt
tesT
tEsT

Best Regards,

guy038

P.S. : A last example, with my_function_n008 as a searched word and ABCD_123IjklmnOP as a replacement word, which, both, contain 16 characters !

So, Let’s supposed the example text, below :

My_Function_N008
mY_fUNCTION_n008
my_FUNCTION_n008
MY_fUnCtIoN_N008

SEARCH (?i)\bmy_function_n008\b

REPLACE #ABCD_123IjklmnOP$0#

We obtain, after this 1ST S/R :

#ABCD_123IjklmnOPMy_Function_N008#
#ABCD_123IjklmnOPmY_fUNCTION_n008#
#ABCD_123IjklmnOPmy_FUNCTION_n008#
#ABCD_123IjklmnOPMY_fUnCtIoN_N008#

SEARCH #|(?-is)(?=\w{16}(?:(\u)|(\l)))(\w)(?=\w*#)|(?i)my_function_n008#

REPLACE \u(?1\3)\l(?2\3)

And, after the final 2ND S/R :

AbCD_123ijklMnOP
aBCd_123IJKlmnOP
abCD_123IJKlmnOP
ABCd_123IjKlMnOP