Change a word in regex



  • I have made a regex search string and a regex replace string for the conversion of an index of a genealogic event from an archive into a ged-file. The index consists of keywords like name of born child, name of father, birthdate, birthplace and so on and keyvalues e.g. John, 01 APR 1851, Amsterdam and so on. In Dutch birth is ‘geboorte’, so I can search for the tirst four characters i.e. GEBO as a keyword, but this is not a valid GED-keyword, it should be BIRT. Does a possibility exist to convert GEBO into BIRT, V (vrouw) into F (female) and some other specific Dutch words? It should be done in one search-replaceoperation - Of course I can do the conversion in a second search-replace, but I would like to minimize manual steps and avoid writing scripts (which is not my strenght). Any help will be welcomed!



  • Hello, @jos-maas,

    No problem with regular expressions, indeed !

    • Open the Replace dialog (Ctrl + H )

    • Check the Regular expression search mode

    SEARCH (?-i)\b(?:(GEBOORTE)|(V))\b

    REPLACE (?1BIRT)(?2F)

    • Click on the Replace All button

    Et voilà !!

    For instance the text :

    GEBOORTE : 01 APR 1851    Sex : V
    

    would be changed into :

    BIRT : 01 APR 1851    Sex : F
    

    Notes :

    • The (?i) syntax is a modifier that means that the search is NON-insensitive ( so, sensitive ! ) to case

    • The \b assertions means that the strings are real words, not glued in larger words. Strictly, it’s a zero-length location, between a NON-word character and a Word character OR between a Word character and a NON-word character !

    • The outer form (?:..........) defines a non-capturing group, which forces any matched word to depend on the \b boundaries

    • Each word, enclosed between parentheses, are stored, for ulterior use, as group 1, group 2,…

    • Then, each conditional replacement, (?#any text), replaces the word, stored in group #, by the corresponding text !


    So, the general syntax, of that S/R, is :

    SEARCH <Modifiers>\b(?:(Word1)|(Word2)|(Word3)|......|(Word#))\b

    REPLACE (?1Repl1)(?2Repl2)(?3Repl3)......(?#Repl#)

    Remarks :

    • You can have up to 99 (?#......) forms, ( from (?1.....) to (?99.....) ), written in ANY order

    • For a group number > 99, use the alternate syntax (?{#}......)

    • On the same way, if the # number is <= 9 and that the replacement text begins with a digit, you must use the (?{#}Repl#) syntax, in order to, correctly, identify the group #

    • Beware that, the Find what: and Replace with: zones, of the Find/Replace dialog, can contain no more than 2046 characters !

    Best Regards,

    guy038



  • Merci beaucoup, guy038, I think this is a valuable supplement to the tutorial! I am going to work it out. Best regards, Jos Maas


Log in to reply