Community
    • Login

    Regex: Put a comma on REPLACE html tags

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    15 Posts 4 Posters 1.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Alan KilbornA
      Alan Kilborn @Vasile Caraus
      last edited by

      @Vasile-Caraus said in Regex: Put a comma on REPLACE html tags:

      I must hit multiple times “replace”.

      It may be worth mentioning that if there are a lot of replacements, holding the Alt key and then holding the r key until they are all replaced is easier than clicking on the Replace button once for each individual replacement.

      1 Reply Last reply Reply Quote 1
      • Vasile CarausV
        Vasile Caraus
        last edited by

        yes, nice solution.

        Maybe @guy038 has a much easier solution, a method does the job in a single pass…

        Alan KilbornA 1 Reply Last reply Reply Quote 0
        • Alan KilbornA
          Alan Kilborn @Vasile Caraus
          last edited by

          @Vasile-Caraus

          You don’t get to choose who answers your questions here.

          1 Reply Last reply Reply Quote 1
          • guy038G
            guy038
            last edited by guy038

            Hi, @vasile-caraus, @peterjones, @alan-kilborn, and All,

            Well, as Peter said, I think that the best is to consider two successive tasks :

            • Firstly, to copy the contents of the <title>....</title> tag in the content attribute of the nearest <meta...../> tag

            • Secondly, modify the syntax of the value of the content attribute of the meta tags, adding a comma and space between each word

            So, assuming the input text below :

            <title>My name is Prince | Always</title>
            bla
            bla
            blah
            <meta name="keywords" content="laptop, home, yellow, diamond"/>
            
            <title>American singer-songwriter. musician ; record producer  dancer | actor filmmaker</title>
            
            Dummy
            text
            
            <meta name="keywords" content="desk, work, red, heart"/>
            
            <title>multi-instrumentalist & guitar virtuoso</title>
            This is
            the end
            of the test
            <meta name="keywords" content="knife, house, green, spade"/>
            

            The following S/R :

            SEARCH (?s)<title>(.*?)<\/title>.*?<meta\x20name="keywords"\x20content="\K.*?(?="\/>)

            REPLACE \1

            would produce :

            <title>My name is Prince | Always</title>
            bla
            bla
            blah
            <meta name="keywords" content="My name is Prince | Always"/>
            
            <title>American singer-songwriter. musician ; record producer  dancer | actor filmmaker</title>
            
            Dummy
            text
            
            <meta name="keywords" content="American singer-songwriter. musician ; record producer  dancer | actor filmmaker"/>
            
            <title>multi-instrumentalist & guitar virtuoso</title>
            This is
            the end
            of the test
            <meta name="keywords" content="multi-instrumentalist & guitar virtuoso"/>
            

            Then, the final regex S/R :

            SEARCH (?s)<title>.*?<\/title>.*?<meta\x20name="keywords"\x20content="\K(\w+)|\G[^\w\r\n]+(\w+)

            REPLACE ?1\l\1:,\x20\l\2

            would change, in one go, the text between content=" and "/> of the <meta> tag

            <title>My name is Prince | Always</title>
            bla
            bla
            blah
            <meta name="keywords" content="my, name, is, prince, always"/>
            
            <title>American singer-songwriter. musician ; record producer  dancer | actor filmmaker</title>
            
            Dummy
            text
            
            <meta name="keywords" content="american, singer, songwriter, musician, record, producer, dancer, actor, filmmaker"/>
            
            <title>multi-instrumentalist & guitar virtuoso</title>
            This is
            the end
            of the test
            <meta name="keywords" content="multi, instrumentalist, guitar, virtuoso"/>
            

            Best Regards,

            guy038

            1 Reply Last reply Reply Quote 3
            • Vasile CarausV
              Vasile Caraus
              last edited by

              wonderful solution, thanks a lot @guy038 !!

              1 Reply Last reply Reply Quote 1
              • guy038G
                guy038
                last edited by guy038

                Hi, @vasile-caraus, @peterjones, @alan-kilborn, and All,

                Sorry ! I forgot to mention two things :

                • Vasile, from your example, the text My name is Prince | Always is, finally, changed as my, name, is, prince, always. So I supposed that you wanted all the words in lower-case ! That’s why I added the \l modifier to get a first letter, in lower-case for groups 1 and 2, containing the words. If not the case, omit these modifiers !

                • Now, in the second S/R, I could had shortened the search regex as below :

                SEARCH (?s)<meta\x20name="keywords"\x20content="\K(\w+)|\G[^\w\r\n]+(\w+)

                But I preferred to use the longer regex because you may have other <meta ..../> tags, not located nearby a <title>....</title> tag and so, not concerned by the replacement !

                Cheers,

                guy038

                1 Reply Last reply Reply Quote 0
                • Vasile CarausV
                  Vasile Caraus
                  last edited by Vasile Caraus

                  @guy038 said in Regex: Put a comma on REPLACE html tags:

                  (?s)<meta\x20name=“keywords”\x20content="\K(\w+)|\G[^\w\r\n]+(\w+)

                  not working this one. It is just delete all my <title> line. Maybe you should write the entire regex.

                  PeterJonesP 1 Reply Last reply Reply Quote 0
                  • PeterJonesP
                    PeterJones @Vasile Caraus
                    last edited by

                    @Vasile-Caraus said in Regex: Put a comma on REPLACE html tags:

                    not working this one. It is just delete all my <title> line.

                    If the edited one is not working but the previous solution was a “wonderful solution” (and thus presumably working), I recommend you happily use the one that you already called “wonderful”.

                    ----

                    Do you want regex search/replace help? Then please be patient and polite, show some effort, and be willing to learn; answer questions and requests for clarification that are made of you. All example text should be marked as plain text using the </> toolbar button or manual Markdown syntax. Screenshots can be pasted from the clipbpard to your post using Ctrl+V to show graphical items, but any text should be included as literal text in your post so we can easily copy/paste your data. Show the data you have and the text you want to get from that data; include examples of things that should match and be transformed, and things that don’t match and should be left alone; show edge cases and make sure you examples are as varied as your real data. Show the regex you already tried, and why you thought it should work; tell us what’s wrong with what you do get… Read the official NPP Searching / Regex docs and the forum’s Regular Expression FAQ. If you follow these guidelines, you’re much more likely to get helpful replies that solve your problem in the shortest number of tries.

                    1 Reply Last reply Reply Quote 0
                    • Vasile CarausV
                      Vasile Caraus
                      last edited by

                      @guy038 said in Regex: Put a comma on REPLACE html tags:

                      (?s)<title>.?</title>.?

                      the latest regex Guy038 update, now I see, that he just write the second part of regex.

                      should have included also (?s)<title>.*?<\/title>.*? at the beginning.

                      So, it is good, now as I test it.

                      SEARCH: (?s)<title>.*?<\/title>.*?(?s)<meta\x20name="keywords"\x20content="\K(\w+)|\G[^\w\r\n]+(\w+)

                      REPLACE BY: ?1\l\1:,\x20\l\2

                      1 Reply Last reply Reply Quote 1
                      • guy038G
                        guy038
                        last edited by guy038

                        Hi, @vasile-caraus and All,

                        No, my simplified search regex, for the second S/R works fine, too ! But I also forgot to add that, due to the \G feature, the caret, before searching ( and replacing ), must NOT be located at beginning of a <meta...../> tag . Otherwise, the regex engine wrongly selects the second alternative !

                        EDIT : I’m wrong again -(( Just forgot everything of this post ! I’ll try, later, to post a correct answer !

                        Cheers,

                        guy038

                        1 Reply Last reply Reply Quote 0
                        • guy038G
                          guy038
                          last edited by guy038

                          Hello, @vasile-caraus and All,

                          In fact, I’m presently on holidays ! Thus, my concentration is not at top level ;-))

                          So, here is the true story ! When using, for the second regex S/R, the simplified form :

                          SEARCH (?s)<meta\x20name="keywords"\x20content="\K(\w+)|\G[^\w\r\n]+(\w+)

                          The first alternative <meta\x20name="keywords"\x20content="\K(\w+) should be used first, in order to detect the first word after the string content="

                          However, if before running the S/R, the caret is at a location before some non-words chars, followed themselves by some words chars, then the regex engine wrongly selects the second alternative , due to the \G syntax

                          So, this new simplified syntax can be used if the initial location of the caret is on an pure empty line. Indeed, as the initial \G location must not be followed with, both, \r or \n ( \G[^\w\r\n]+...... ), this means that, necessarily, the next match will come from the first alternative of the regex, which is the correct solution !

                          But, of course, this new simplified regex is, then, no related, anymore, to a previous <title>......</title> tag !

                          BR

                          guy038

                          1 Reply Last reply Reply Quote 3
                          • First post
                            Last post
                          The Community of users of the Notepad++ text editor.
                          Powered by NodeBB | Contributors