Replace string, but maintain substring (convert Markdown to HTML image)



  • I have 500 Markdown files where I need to replace a complete line with another line. This surely can be done using Regex, but I have no clue how.

    I want to replace this line (last words of line differ in each file):

    ![](..\pregit\uploads\1kln-2hdw-4h50/media/image1.jpeg){width="5.1722222222222225in" height="2.5868055555555554in"} + Variable tekst
    

    with:

    <img src="/uploads/1kln-2hdw-4h50/image1.jpeg" height="300">
    

    Note that the ‘1kln-2hdw-4h50’ is reused in the new line.



  • Hello, @jeroen-borgman, and All,

    Again, a regex S/R is the solution !

    I supposed that :

    • The image names may be different

    • The part 1kln-2hdw-4h50 may be different


    So, given the input sample text, below :

    ![](..\pregit\uploads\1kln-2hdw-4h50/media/image1.jpeg){width="5.1722222222222225in" height="2.5868055555555554in"}    bla bla blah
    
    ![](..\pregit\uploads\9ftu-7abc-10h27/media/My Image.jpeg){width="9.2333in" height="3.555556in"}    Variable text
    
    ![](..\pregit\uploads\5gya-0hgz-2h36/media/image45678.png){width="10.0in" height="5.0in"}   This a test : ( .../media/... ) bla blah !
    

    If you use the following regex S/R :

    SEARCH (?x-is) \Q![](..\pregit\uploads\\E (.+?) /media (/.+?) \) .+

    REPLACE <img src="/uploads/\1\2" height="300">

    You’ll get the modified text :

    <img src="/uploads/1kln-2hdw-4h50/image1.jpeg" height="300">
    
    <img src="/uploads/9ftu-7abc-10h27/My Image.jpeg" height="300">
    
    <img src="/uploads/5gya-0hgz-2h36/image45678.png" height="300">
    

    Hope that it’s your expected output text ;-))

    Notes :

    • First, the in-line modifiers (?x-is) means :

      • That any space char in the regex is NOT taken in account by the regex engine and just helps the user to better identify the different sections of the search regex ( (?x) ). In this mode, if you need to search for a space character, use, either, the syntax :

        • \x20 ( The escaped form )

        • [ ] ( A space char, in a class character )

        • A \ symbol, followed with a Space char

      • Any dot regex char ( . ) will match a single standard character, only and not an EOL one ( (?-s )

      • The search engine carries the search in a NON-insensitive way ( (?-i) )

    • Then the part \Q![](..\pregit\uploads\\E, simply delimits a literal string, between the two \Q and \E syntaxes, to be matched, with that exact case

    • Then, the part (.+?) matches the shortest string of any character before the /media string, with that exact case, stored as group 1` because of the embedded parentheses

    • Now, the part /media matches the litteral string /media, with that exact case

    • And the following part (/.+?) looks for a slash symbol / followed with the shortest string of any character before an ending parenthesis \), stored as group 2 because of the embedded parentheses

    • Then, the part \) matches a literal ending parenthesis

    • And, finally the part .+ matches all the remaining standard characters of current line

    • In replacement, all the current line contents are replaced with :

      • The part <img src="/uploads/, which rewrites this exact expression, first

      • The part \1\2, which rewrites the contents of groups 1, then 2

      • The part " height="300">, which rewrites this exact expression

    Best Regards,

    guy038

    P.S. :

    You must be aware of a fundamental difference, in regex syntaxes containing variable quantifiers, like *, +, ?, {n,} and {n,m}

    • You may use the quantifier, by itself

    • You may add the ? symbol, right after the quantifier

    For instance, the regex abc.+xyz may not match the same expresions as the abc.+?xyz will !

    Against the text - abcdefghijklmnopqrstuvwxyz - abcdefghijklmnopqrstuvwxyz - : :

    • The regex abc.+xyz would match the string abcdefghijklmnopqrstuvwxyz - abcdefghijklmnopqrstuvwxyz, i.e. the longest string between the abc and the xyz strings

    • Whereas the regex abc.+?xyz would match the string abcdefghijklmnopqrstuvwxyz, i.e. the shortest string between the abc and the xyz strings


    Jeroen, just remove one ? symbol or the two ones, in the search regex above. As you can see, the third line of the sample text is, now, wrongly replaced :-((



  • @guy038 said in Replace string, but maintain substring (convert Markdown to HTML image):

    <img src="/uploads/\1\2" height=“300”>

    WoW! pure magic!
    Thanks for this Guy. I not only like the solution, but also the explanation.



  • Hi, @jeroen-borgman, and All,

    Thanks for your comment !

    The Free Spacing regex mode also allows you to place the different parts of your regex in consecutive lines, with possible comments after a # character, as below :

    (?x)                          # FREE SPACING regex mode
    (?-is)                        # DOT regex char = 1 STANDARD char and search SENSITIVE to case
    
    ^                             # START of CURRENT line boundary ( Added to be more RIGOROUS ! )
    \Q![](..\pregit\uploads\\E    # LITTERAL string ![](..\pregit\uploads\
    (.+?)                         # Part BETWEEN uploads\ and /media ( Group 1 )
    /media                        # LITTERAL string /media
    (/.+?)                        # Image NAME ( Group 2 )
    \)                            # LITTERAL string ) The ESCAPED form is necessary as PARENTHESES are REGEX chars !
    .+                            # REMAINING chars of CURRENT line scanned
    

    Just select all these lines and paste them in the Find what: field of the Find dialog ;-))

    Note that if your regex must contain a # char, just place use the escaped syntax \# or the character class [#]

    Cheers,

    guy038


Log in to reply