RegEx Help with Backreference I think
-
@guy038 said in RegEx Help with Backreference I think:
I am struggling with trying to accomplish what I think is required.
FIND BookName and isolate into a Group
Indicate the text to be acted on
Input required change, the goal
Put the change into the REPLACE fieldI think that is it, but fitting it into a lookahead RegEx is still eluding me.
I will continue to work on it, but I have been for hours, and any guidance would be great.
-
I think that the most simple solution is to verify that, throughout your file, any “space” character, after the verse number, is ALWAYS a tabulation char !
So, given your ssimple INPUT text :
Kulava 1 Kulava 1:1 Halikumbi lyakavanga mukakweji wamuchivali, mumwaka wamuchivali kufuma haze valovokele mulifuchi lyaEjipitu, Yehova alwezele Mose mumakango aShinai muMbalaka yakuliwanyina ngwenyi, Kulava 1:2 Lavenu lizavu lyavana vaIsalele lyosena mwaya jitanga javo muvisaka vyavakakuluka javo mukuvula chamajina avo, malunga vosena umwe naumwe, Kulava 1:3 kufuma kuli ava vamyaka makumi avali (20) nakusambula, vosena vaze vakuhasa kulwa jita muli vaIsalele. Ove naAlone muvalave halizavu halizavu. Kulava 1:4 Kaha kutanga hitanga kufume lunga apwenga nayenu, kaha mutu himutu apwenga mwata wakutanga yavakakuluka jenyi. Kulava 1:5 Awa akiko majina amalunga navamikafwa — mutanga yaLuvene mufume Elizule mwanaSheteule. Kulava 1:6 Mutanga yaShimeyone mufume Shelumiyele mwanaZulishatai. Kulava 1:7 Mutanga yaYuta mufume Nashone mwanaAminatave. Kulava 1:8 Mutanga yaIsakale mufume Netanele mwanaZuwale. Kulava 1:9 Mutanga yaZevulune mufume Eliyave mwanaHelone. Kulava 1:10 Vamuli vana vaYosefwe navapwa ava — mutanga yaEfwalime mufume Elishama mwanaAmihute, mutanga yaManase mufume Ngamalyele mwanaPetazule.
The following regex S/R :
-
FIND
^(.+ \d+:\d+)[\x20\t]+
-
REPLACE
${1}\t
would return this OUTPUT text :
Kulava 1 Kulava 1:1 Halikumbi lyakavanga mukakweji wamuchivali, mumwaka wamuchivali kufuma haze valovokele mulifuchi lyaEjipitu, Yehova alwezele Mose mumakango aShinai muMbalaka yakuliwanyina ngwenyi, Kulava 1:2 Lavenu lizavu lyavana vaIsalele lyosena mwaya jitanga javo muvisaka vyavakakuluka javo mukuvula chamajina avo, malunga vosena umwe naumwe, Kulava 1:3 kufuma kuli ava vamyaka makumi avali (20) nakusambula, vosena vaze vakuhasa kulwa jita muli vaIsalele. Ove naAlone muvalave halizavu halizavu. Kulava 1:4 Kaha kutanga hitanga kufume lunga apwenga nayenu, kaha mutu himutu apwenga mwata wakutanga yavakakuluka jenyi. Kulava 1:5 Awa akiko majina amalunga navamikafwa — mutanga yaLuvene mufume Elizule mwanaSheteule. Kulava 1:6 Mutanga yaShimeyone mufume Shelumiyele mwanaZulishatai. Kulava 1:7 Mutanga yaYuta mufume Nashone mwanaAminatave. Kulava 1:8 Mutanga yaIsakale mufume Netanele mwanaZuwale. Kulava 1:9 Mutanga yaZevulune mufume Eliyave mwanaHelone. Kulava 1:10 Vamuli vana vaYosefwe navapwa ava — mutanga yaEfwalime mufume Elishama mwanaAmihute, mutanga yaManase mufume Ngamalyele mwanaPetazule.
You could suppose that using a look-behind, for the first part of the whole search regex, would work :
-
FIND
(?<=^.+ \d+:\d+)[\x20\t]+
-
REPLACE
\t
But, this construction is illegal as our
Boost
regex engine does not support look-behinds of VARIABLE length !
To get around this drawback, we may use the
\K
feature :-
First, the search regex matches any leading part
..... \d+:\d+
-
As soon as it meets the
\K
feature, the regex engine :-
Cancels any previous search, so far
-
Adjusts the engine regex position to the position of
\K
so, right after the verse number and before any mixed range of tabulations or space characters ([\x20\t]+
) -
Re-start the search of the final part
[\x20\t]+
-
And leads to the following regex S/R :
-
FIND
^.+ \d+:\d+\K[\x20\t]+
-
REPLACE
\t
IMPORTANT : If you’re using this second regex S/R with the
\K
feature, the replacement MUST be a global one, using theReplace All
button. You CANNOT use the step by step replacement with theReplace
button !Best regards,
guy038
-
-
@guy038 Thanx for your kind dedication and patience in helping me. I can now do in a couple of minutes what was taking days.
Starting again armed with the new MACROs is indeed a new beginning.
I wish you and Terry every blessings -
Hello, @robert-or-janet-diebel,
Regarding my last regex to normalize “space” characters to ONE tabulation char, only, I thought of a better regex S/R, which should speed up all the process as it skips all the correct lines containing just one tabulation character !
My very-Best-Novel 20:10 Mutanga yaYuta mufume Nashone mwanaAminatave. My very-Best-Novel 20:10 Mutanga yaYuta mufume Nashone mwanaAminatave. My very-Best-Novel 20:10 Mutanga yaYuta mufume Nashone mwanaAminatave. My very-Best-Novel 20:10 Mutanga yaYuta mufume Nashone mwanaAminatave. My very-Best-Novel 20:10 Mutanga yaYuta mufume Nashone mwanaAminatave.
-
FIND
^.+ \d+:\d+\K(?:[\x20\t]{2,}|\x20)
-
REPLACE
\t
-
Click on the
Replace All
button
The third line is skipped as it just contains
1
tab char and we get the OUTPUT text :My very-Best-Novel 20:10 Mutanga yaYuta mufume Nashone mwanaAminatave. My very-Best-Novel 20:10 Mutanga yaYuta mufume Nashone mwanaAminatave. My very-Best-Novel 20:10 Mutanga yaYuta mufume Nashone mwanaAminatave. My very-Best-Novel 20:10 Mutanga yaYuta mufume Nashone mwanaAminatave. My very-Best-Novel 20:10 Mutanga yaYuta mufume Nashone mwanaAminatave.
Note the use of a non-capturing group
(?:......)
as we do not need the contents of this group, either, in the search and/or the replace regex !Best Regards,
guy038
-
-
@guy038 Thank You very much for the extra effort.