Community
    • Login

    Regex: How to remove enewline character from a particular html tags?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    9 Posts 4 Posters 488 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Robin CruiseR
      Robin Cruise
      last edited by

      I have this html tag, which is interrupted by /n at some point after word masuri:

      <p class="mb-40px">Aceasta este o melodie alcatuita din patru masuri:
      reluata apoi de catre instrumentul solist cu un cintec popular.</p>
      

      THE OUTPUT must be:

      <p class="mb-40px">Aceasta este o melodie alcatuita din patru masuri: reluata apoi de catre instrumentul solist cu un cintec popular.</p>
      

      I try this regex, but doesn’t work too good, because is also change the entire html code, not just that particular tag.

      FIND: (?:<p class="mb-40px">|\G)(?:(?!</p>).)*?\K(\r\n|\r|\n)

      REPLACE BY: \x20

      Also, I find a solution of @neil-schipper from a page on this forum, but I don’t know how to integrate with my html tag :

      FIND: (?<=[^\r\n])\R(?=[^\r\n])
      REPLACE BY: (LEAVE EMPTY)

      Alan KilbornA 1 Reply Last reply Reply Quote 0
      • Alan KilbornA
        Alan Kilborn @Robin Cruise
        last edited by

        @robin-cruise

        This is just a (by now) simple replace-but-only-between-delimiters problem; see HERE for the templatized solution.

        Robin CruiseR 1 Reply Last reply Reply Quote 1
        • Robin CruiseR
          Robin Cruise @Alan Kilborn
          last edited by

          @alan-kilborn THANKS, it works !!

          Find: (?-i:<p class="mb-40px">|(?!\A)\G)(?s:(?!</p>).)*?\K(?-i:(?<=[^\r\n])\R(?=[^\r\n]))

          Replace by: \x20

          Robin CruiseR 1 Reply Last reply Reply Quote 2
          • Robin CruiseR
            Robin Cruise @Robin Cruise
            last edited by Robin Cruise

            Another solution: (\r\n|\r|\n)

            FIND: (<p class="mb-40px">)+(.)+\K(\r\n|\r|\n)(?=.*<\/p>)

            REPLACE BY: \x20

            The below GENERIC regex formula can be much simple made then @guy038 made in many other of his GENERIC regex formulas:

            (REGION-START)+(.)+\K(FIND REGEX)(?=.*REGION-FINAL)

            Alan KilbornA 1 Reply Last reply Reply Quote 1
            • Alan KilbornA
              Alan Kilborn @Robin Cruise
              last edited by

              @robin-cruise said in Regex: How to remove enewline character from a particular html tags?:

              The below GENERIC regex formula can be much simple made then @guy038 made

              Why should you be believed over @guy038 ?

              Alan KilbornA 1 Reply Last reply Reply Quote 0
              • Hellena CrainicuH
                Hellena Crainicu
                last edited by

                @alan-kilborn @guy038

                another alternative of Robin’s generic, a better version, can be:

                (REGION-START)+(.)+\K(FIND REGEX)(?s:(?=.*(REGION-FINAL)))

                1 Reply Last reply Reply Quote 0
                • guy038G
                  guy038
                  last edited by guy038

                  Hello, @rovbin-cruise, @alan-kilborn, @hellena-crainicu and All,

                  Refering to my first blog post about a generic regex, below :

                  https://community.notepad-plus-plus.org/post/75007

                  and as Robin want to search for line-ending chars, we need to use, of course the complete generic regex S/R :

                  SEARCH (?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\K(?-si:FR)

                  REPLACE RR

                  and not the simplified single-line version


                  So :

                  • The FR regex is just \R, as the non-capturing associated group, beginning with (?_si:..., is useless in this case

                  • The RR regex is \x20

                  • The BSR regex may be strictly the string <p class="mb-40px"> but may also be expressed as <p class=".+?">

                  • The ESR regex is, of course, the ending tag </p>, which must never occurs before the next line-ending to replace

                  giving the functional regex S/R :

                  SEARCH (?-si:<p class=".+?">|(?!\A)\G)(?s-i:(?!</p>).)*?\K\R

                  REPLACE \x20

                  Test it against that text :

                  <a href="https://www.w3schools.com/">We strongly suggest
                  to visit the
                  w3schools.com
                  site</a>
                  
                  <p class="mb-40px">Aceasta
                  este o melodie alcatuita
                  din patru masuri:
                  reluata apoi de catre instrumentul solist
                  cu un cintec popular.</p>
                  
                  <p class="Test">A SINGLE line</p>
                  
                  <h1>this is
                  my very
                  first heading
                  </h1>
                  
                  <p class="123-456 789">This is	
                  a quick
                  text to
                  verify if it
                  replaces line-endings
                  by a space char in <p>
                  tags ONLY</p>
                  

                  ONLY the <p class.............<p>, multi-lines or not, should be concerned by the replacement !

                  Of course, these HTML commands do not represent a legal HTML file and are just used to verify the regex S/R !


                  Now, the generic variants, proposed by @Robin-cruise and @hellena-crainicu, with a final look-ahead only, containing the ESR region, will not work, most of a time :-(

                  SEARCH (?-si:BSR|(?!\A)\G).*?\K(?-si:FR)(?=(?s-i:.*?ESR))

                  In our case, the functional regex S/R becomes :

                  SEARCH (?-si:<p class=".+?">|(?!\A)\G).*?\K\R(?=(?s-i:.*?</p>))

                  REPLACE \x20

                  But if you test it against, for instance :

                  
                  <p class="Test">Several
                  consecutive
                  lines</p>
                  
                  <h1>this is
                  my very
                  first heading
                  </h1>
                  
                  <p class="Test">A SINGLE line</p>
                  
                  <h2>this is
                  my second
                  heading
                  </h2>
                  

                  It would concatenate all text till the last </p> of the file, just leaving the last <h2> tag untouched. You could say : But I did add a final question mark in order to get a lazy range of chars before </p> !

                  You’re right ! But remember that the regex engine tries, by all means, to get a solution. So, it matches the CRLF chars, which follow lines</p>, because the regex engine considers that the .*? lazy range of chars begins immediately after the line-ending and continues till right before the third and final </p>, so defining a correct look-ahead assertion !

                  Thus, testing if the ESR region is not reached at any position, till a NEXT FR match, seems the only method which works properly !

                  Best Regards

                  guy038

                  Reminder : Move to the very beginning of text before clicking on the Find Next or Replace All button !

                  1 Reply Last reply Reply Quote 2
                  • Alan KilbornA
                    Alan Kilborn @Alan Kilborn
                    last edited by

                    @alan-kilborn said in Regex: How to remove enewline character from a particular html tags?:

                    Why should you be believed over @guy038 ?

                    @guy038 said in Regex: How to remove enewline character from a particular html tags?:

                    Now, the generic variants, proposed by @Robin-cruise and @hellena-crainicu, with a final look-ahead only, containing the ESR region will not work, most of a time :-(


                    @Robin-cruise and @hellena-crainicu :

                    Be careful of posting simplifications.

                    Probably best to leave these things to the “Master”. :-)

                    1 Reply Last reply Reply Quote 2
                    • Hellena CrainicuH
                      Hellena Crainicu
                      last edited by Hellena Crainicu

                      The best solution is this:

                      (?-si:<p class=".+?">|(?!\A)\G)(?s-i:(?!</p>).)*?\K\s+

                      General regex: (?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\KFR

                      1 Reply Last reply Reply Quote 0
                      • First post
                        Last post
                      The Community of users of the Notepad++ text editor.
                      Powered by NodeBB | Contributors