Hi all,
In my previous post, I gave a general method to replace a specific character by an other, everywhere in lines of a delimited text, except for a range, between column c1 and c2. I now give you an extension of that method to SEVERAL fixed zones to exclude !
I mean :
^---------- Zone 1 to exclude ------------ Zone 2 to exclude -------------------- Zone 3 to exclude ------------$
So, let’s suppose the original text, below :
abcd,04,,11111111, 22,ANYWORD ,ANYWORD,ANY,QZERTY,001,,5555,,AN,Y ANY,pqrst,00x,ANYWORD ,ANYWORD,9A9 ,Last Field,
fghi,02,,22222222, 22,ANY ,ANY, WORDS, ANY,AZERTY,999,,6666,,ANY AN,Y,uvwxy,01y,ANY ,ANY, WORDS,,7Z3 ,Last Field,
klmn,09,,33333333, 22,WORDS,ANY, WORDS,ANY,TEST-1,123,,7777,,ANY,,ANY,zabcd,02z,,ORDS,ANY, WORDS,3H5 ,Last Field,
I defined 3 zones to exclude, where the comma character will NOT be changed, while the S/R process :
The zone
1, which starts at column
26 and ends at column
52 =>
S1= 26 and
E1 = 52
The zone
2, which starts at column
65 and ends at column
72 =>
S2= 65 and
E2 = 72
The zone
3, which starts at column
84 and ends at column
99 =>
S3= 84 and
E3 = 99
As previously explained, we, temporarily, add the # or @ boundaries, in order to delimit these 3 zones, with the general S/R, below :
^(.{S1-1})(.{E1-S1+1})(.{S2-E1-1})(.{E2-S2+1})(.{S3-E2-1))(.{E3-S3+1})..............(.{Sn-En-1-1})(.{En-Sn+1})
With the given values of S1 through E3 above, we get the following S/R :
SEARCH : ^(.{25})(.{27})(.{12})(.{8})(.{11})(.{16})
REPLACE : \1#\2@\3#\4@\5#\6@
which gives us the delimited text, below, with the boundaries :
abcd,04,,11111111, 22,#ANYWORD ,ANYWORD,ANY,QZERTY@,001,,5555,,#AN,Y ANY@,pqrst,00x,#ANYWORD ,ANYWORD@,9A9 ,Last Field,
fghi,02,,22222222, 22,#ANY ,ANY, WORDS, ANY,AZERTY@,999,,6666,,#ANY AN,Y@,uvwxy,01y,#ANY ,ANY, WORDS,@,7Z3 ,Last Field,
klmn,09,,33333333, 22,#WORDS,ANY, WORDS,ANY,TEST-1@,123,,7777,,#ANY,,ANY@,zabcd,02z,#,ORDS,ANY, WORDS@,3H5 ,Last Field,
Then, running the second regex S/R, below :
SEARCH : ,(?=[^@]*#)|,(?![^#]*@)|(#|@)
REPLACE : (?1:_)
we obtain the final text :
abcd_04__11111111_ 22_ANYWORD ,ANYWORD,ANY,QZERTY_001__5555__AN,Y ANY_pqrst_00x_ANYWORD ,ANYWORD_9A9 _Last Field_
fghi_02__22222222_ 22_ANY ,ANY, WORDS, ANY,AZERTY_999__6666__ANY AN,Y_uvwxy_01y_ANY ,ANY, WORDS,_7Z3 _Last Field_
klmn_09__33333333_ 22_WORDS,ANY, WORDS,ANY,TEST-1_123__7777__ANY,,ANY_zabcd_02z_,ORDS,ANY, WORDS_3H5 _Last Field_
As expected, all the commas, located from column 26 till column 52, from column 65 till column 72 and from column 84 till column 99, have NOT been changed into an underscore character !
Notes :
In comparison to the previous regexes, only the look-aheads of the second S/R, are slightly different :
The positive look-ahead (?=[^@]*#) verifies that, from the cursor location, a # character, can be found further, on the current line scanned, without any @ character, between the cursor location and the # location
The negative look-ahead (?![^#]*@) verifies that, from the cursor location, a @ character, cannot be found further, on the current line scanned, without any # character, between the cursor location and the @ location
Cheers,
guy038