• Login
Community
  • Login

Regex: Put a comma on REPLACE html tags

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
15 Posts 4 Posters 1.1k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • V
    Vasile Caraus
    last edited by Vasile Caraus Aug 5, 2020, 1:06 PM Aug 5, 2020, 1:05 PM

    hello. Maybe you can help me. So, I have these 2 html tags:

    <title>My name is Prince | Always</title>

    <meta name="keywords" content="laptop, home, yellow, diamond"/>

    With the regex below I can replace the content of the <title></title> tag with the content of <meta name=..> tag

    . matches newline:

    Search: (<title>(.*?)<\/title>.*?)(<meta name="keywords" content=").*?("\/>)

    REPLACE BY: \1\3\2\4

    BUT, I need to put a comma between words, after replace, on the ` tag

    So, the output should be:

    <meta name="keywords" content="my, name, is, prince, always"/>

    Can anyone help me?

    P 1 Reply Last reply Aug 5, 2020, 1:30 PM Reply Quote 0
    • P
      PeterJones @Vasile Caraus
      last edited by Aug 5, 2020, 1:30 PM

      @Vasile-Caraus ,

      I am not sure I understand you (a couple things didn’t translate well, sorry). But I think I have an idea.

      I would do it in multiple parts: first, start with the regex you current use to populate the <meta...> tag.
      Then use

      • FIND WHAT = (content=")(.*?(?<!,))\x20\|?\x20*(.*)
      • REPLACE WITH = $1\l$2,\x20\l$3

      Hit Replace multiple times until it finishes.

      Basically, I break the line up into three parts: 1: content=", 2: the text before I want the comma, and 3: the text after the comma. I match a space, then an optional | and 0 or more spaces as the separator between tokens (thus removing the | like you show in your example). I use the \l in the replacement to make the first character of groups 2 and 3 lowercase.

      If you always knew the number of words in the title was going to be the same number of words, you could probably make it one operation to populate and split and add commas… but this might be better than doing it manually.

      V 1 Reply Last reply Aug 5, 2020, 1:53 PM Reply Quote 1
      • V
        Vasile Caraus
        last edited by Aug 5, 2020, 1:48 PM

        This post is deleted!
        1 Reply Last reply Reply Quote 0
        • V
          Vasile Caraus @PeterJones
          last edited by Aug 5, 2020, 1:53 PM

          @PeterJones said in Regex: Put a comma on REPLACE html tags:

          @Vasile-Caraus ,

          I am not sure I understand you (a couple things didn’t translate well, sorry). But I think I have an idea.

          I would do it in multiple parts: first, start with the regex you current use to populate the <meta...> tag.
          Then use

          • FIND WHAT = (content=")(.*?(?<!,))\x20\|?\x20*(.*)
          • REPLACE WITH = $1\l$2,\x20\l$3

          Hit Replace multiple times until it finishes.

          Basically, I break the line up into three parts: 1: content=", 2: the text before I want the comma, and 3: the text after the comma. I match a space, then an optional | and 0 or more spaces as the separator between tokens (thus removing the | like you show in your example). I use the \l in the replacement to make the first character of groups 2 and 3 lowercase.

          If you always knew the number of words in the title was going to be the same number of words, you could probably make it one operation to populate and split and add commas… but this might be better than doing it manually.

          thanks, nice. Yes, I must hit multiple times “replace”.

          A 1 Reply Last reply Aug 5, 2020, 1:58 PM Reply Quote 0
          • A
            Alan Kilborn @Vasile Caraus
            last edited by Aug 5, 2020, 1:58 PM

            @Vasile-Caraus said in Regex: Put a comma on REPLACE html tags:

            I must hit multiple times “replace”.

            It may be worth mentioning that if there are a lot of replacements, holding the Alt key and then holding the r key until they are all replaced is easier than clicking on the Replace button once for each individual replacement.

            1 Reply Last reply Reply Quote 1
            • V
              Vasile Caraus
              last edited by Aug 6, 2020, 7:32 AM

              yes, nice solution.

              Maybe @guy038 has a much easier solution, a method does the job in a single pass…

              A 1 Reply Last reply Aug 6, 2020, 11:55 AM Reply Quote 0
              • A
                Alan Kilborn @Vasile Caraus
                last edited by Aug 6, 2020, 11:55 AM

                @Vasile-Caraus

                You don’t get to choose who answers your questions here.

                1 Reply Last reply Reply Quote 1
                • G
                  guy038
                  last edited by guy038 Aug 7, 2020, 2:41 PM Aug 7, 2020, 2:39 PM

                  Hi, @vasile-caraus, @peterjones, @alan-kilborn, and All,

                  Well, as Peter said, I think that the best is to consider two successive tasks :

                  • Firstly, to copy the contents of the <title>....</title> tag in the content attribute of the nearest <meta...../> tag

                  • Secondly, modify the syntax of the value of the content attribute of the meta tags, adding a comma and space between each word

                  So, assuming the input text below :

                  <title>My name is Prince | Always</title>
                  bla
                  bla
                  blah
                  <meta name="keywords" content="laptop, home, yellow, diamond"/>
                  
                  <title>American singer-songwriter. musician ; record producer  dancer | actor filmmaker</title>
                  
                  Dummy
                  text
                  
                  <meta name="keywords" content="desk, work, red, heart"/>
                  
                  <title>multi-instrumentalist & guitar virtuoso</title>
                  This is
                  the end
                  of the test
                  <meta name="keywords" content="knife, house, green, spade"/>
                  

                  The following S/R :

                  SEARCH (?s)<title>(.*?)<\/title>.*?<meta\x20name="keywords"\x20content="\K.*?(?="\/>)

                  REPLACE \1

                  would produce :

                  <title>My name is Prince | Always</title>
                  bla
                  bla
                  blah
                  <meta name="keywords" content="My name is Prince | Always"/>
                  
                  <title>American singer-songwriter. musician ; record producer  dancer | actor filmmaker</title>
                  
                  Dummy
                  text
                  
                  <meta name="keywords" content="American singer-songwriter. musician ; record producer  dancer | actor filmmaker"/>
                  
                  <title>multi-instrumentalist & guitar virtuoso</title>
                  This is
                  the end
                  of the test
                  <meta name="keywords" content="multi-instrumentalist & guitar virtuoso"/>
                  

                  Then, the final regex S/R :

                  SEARCH (?s)<title>.*?<\/title>.*?<meta\x20name="keywords"\x20content="\K(\w+)|\G[^\w\r\n]+(\w+)

                  REPLACE ?1\l\1:,\x20\l\2

                  would change, in one go, the text between content=" and "/> of the <meta> tag

                  <title>My name is Prince | Always</title>
                  bla
                  bla
                  blah
                  <meta name="keywords" content="my, name, is, prince, always"/>
                  
                  <title>American singer-songwriter. musician ; record producer  dancer | actor filmmaker</title>
                  
                  Dummy
                  text
                  
                  <meta name="keywords" content="american, singer, songwriter, musician, record, producer, dancer, actor, filmmaker"/>
                  
                  <title>multi-instrumentalist & guitar virtuoso</title>
                  This is
                  the end
                  of the test
                  <meta name="keywords" content="multi, instrumentalist, guitar, virtuoso"/>
                  

                  Best Regards,

                  guy038

                  1 Reply Last reply Reply Quote 3
                  • V
                    Vasile Caraus
                    last edited by Aug 7, 2020, 2:52 PM

                    wonderful solution, thanks a lot @guy038 !!

                    1 Reply Last reply Reply Quote 1
                    • G
                      guy038
                      last edited by guy038 Aug 7, 2020, 3:25 PM Aug 7, 2020, 3:24 PM

                      Hi, @vasile-caraus, @peterjones, @alan-kilborn, and All,

                      Sorry ! I forgot to mention two things :

                      • Vasile, from your example, the text My name is Prince | Always is, finally, changed as my, name, is, prince, always. So I supposed that you wanted all the words in lower-case ! That’s why I added the \l modifier to get a first letter, in lower-case for groups 1 and 2, containing the words. If not the case, omit these modifiers !

                      • Now, in the second S/R, I could had shortened the search regex as below :

                      SEARCH (?s)<meta\x20name="keywords"\x20content="\K(\w+)|\G[^\w\r\n]+(\w+)

                      But I preferred to use the longer regex because you may have other <meta ..../> tags, not located nearby a <title>....</title> tag and so, not concerned by the replacement !

                      Cheers,

                      guy038

                      1 Reply Last reply Reply Quote 0
                      • V
                        Vasile Caraus
                        last edited by Vasile Caraus Aug 7, 2020, 3:46 PM Aug 7, 2020, 3:45 PM

                        @guy038 said in Regex: Put a comma on REPLACE html tags:

                        (?s)<meta\x20name=“keywords”\x20content="\K(\w+)|\G[^\w\r\n]+(\w+)

                        not working this one. It is just delete all my <title> line. Maybe you should write the entire regex.

                        P 1 Reply Last reply Aug 7, 2020, 3:50 PM Reply Quote 0
                        • P
                          PeterJones @Vasile Caraus
                          last edited by Aug 7, 2020, 3:50 PM

                          @Vasile-Caraus said in Regex: Put a comma on REPLACE html tags:

                          not working this one. It is just delete all my <title> line.

                          If the edited one is not working but the previous solution was a “wonderful solution” (and thus presumably working), I recommend you happily use the one that you already called “wonderful”.

                          ----

                          Do you want regex search/replace help? Then please be patient and polite, show some effort, and be willing to learn; answer questions and requests for clarification that are made of you. All example text should be marked as plain text using the </> toolbar button or manual Markdown syntax. Screenshots can be pasted from the clipbpard to your post using Ctrl+V to show graphical items, but any text should be included as literal text in your post so we can easily copy/paste your data. Show the data you have and the text you want to get from that data; include examples of things that should match and be transformed, and things that don’t match and should be left alone; show edge cases and make sure you examples are as varied as your real data. Show the regex you already tried, and why you thought it should work; tell us what’s wrong with what you do get… Read the official NPP Searching / Regex docs and the forum’s Regular Expression FAQ. If you follow these guidelines, you’re much more likely to get helpful replies that solve your problem in the shortest number of tries.

                          1 Reply Last reply Reply Quote 0
                          • V
                            Vasile Caraus
                            last edited by Aug 7, 2020, 3:55 PM

                            @guy038 said in Regex: Put a comma on REPLACE html tags:

                            (?s)<title>.?</title>.?

                            the latest regex Guy038 update, now I see, that he just write the second part of regex.

                            should have included also (?s)<title>.*?<\/title>.*? at the beginning.

                            So, it is good, now as I test it.

                            SEARCH: (?s)<title>.*?<\/title>.*?(?s)<meta\x20name="keywords"\x20content="\K(\w+)|\G[^\w\r\n]+(\w+)

                            REPLACE BY: ?1\l\1:,\x20\l\2

                            1 Reply Last reply Reply Quote 1
                            • G
                              guy038
                              last edited by guy038 Aug 7, 2020, 4:26 PM Aug 7, 2020, 4:06 PM

                              Hi, @vasile-caraus and All,

                              No, my simplified search regex, for the second S/R works fine, too ! But I also forgot to add that, due to the \G feature, the caret, before searching ( and replacing ), must NOT be located at beginning of a <meta...../> tag . Otherwise, the regex engine wrongly selects the second alternative !

                              EDIT : I’m wrong again -(( Just forgot everything of this post ! I’ll try, later, to post a correct answer !

                              Cheers,

                              guy038

                              1 Reply Last reply Reply Quote 0
                              • G
                                guy038
                                last edited by guy038 Aug 7, 2020, 9:50 PM Aug 7, 2020, 9:27 PM

                                Hello, @vasile-caraus and All,

                                In fact, I’m presently on holidays ! Thus, my concentration is not at top level ;-))

                                So, here is the true story ! When using, for the second regex S/R, the simplified form :

                                SEARCH (?s)<meta\x20name="keywords"\x20content="\K(\w+)|\G[^\w\r\n]+(\w+)

                                The first alternative <meta\x20name="keywords"\x20content="\K(\w+) should be used first, in order to detect the first word after the string content="

                                However, if before running the S/R, the caret is at a location before some non-words chars, followed themselves by some words chars, then the regex engine wrongly selects the second alternative , due to the \G syntax

                                So, this new simplified syntax can be used if the initial location of the caret is on an pure empty line. Indeed, as the initial \G location must not be followed with, both, \r or \n ( \G[^\w\r\n]+...... ), this means that, necessarily, the next match will come from the first alternative of the regex, which is the correct solution !

                                But, of course, this new simplified regex is, then, no related, anymore, to a previous <title>......</title> tag !

                                BR

                                guy038

                                1 Reply Last reply Reply Quote 3
                                1 out of 15
                                • First post
                                  1/15
                                  Last post
                                The Community of users of the Notepad++ text editor.
                                Powered by NodeBB | Contributors