Community
    • Login

    Regex: How to remove enewline character from a particular html tags?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    9 Posts 4 Posters 1.3k Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Robin CruiseR Offline
      Robin Cruise
      last edited by

      I have this html tag, which is interrupted by /n at some point after word masuri:

      <p class="mb-40px">Aceasta este o melodie alcatuita din patru masuri:
      reluata apoi de catre instrumentul solist cu un cintec popular.</p>
      

      THE OUTPUT must be:

      <p class="mb-40px">Aceasta este o melodie alcatuita din patru masuri: reluata apoi de catre instrumentul solist cu un cintec popular.</p>
      

      I try this regex, but doesn’t work too good, because is also change the entire html code, not just that particular tag.

      FIND: (?:<p class="mb-40px">|\G)(?:(?!</p>).)*?\K(\r\n|\r|\n)

      REPLACE BY: \x20

      Also, I find a solution of @neil-schipper from a page on this forum, but I don’t know how to integrate with my html tag :

      FIND: (?<=[^\r\n])\R(?=[^\r\n])
      REPLACE BY: (LEAVE EMPTY)

      Alan KilbornA 1 Reply Last reply Reply Quote 0
      • Alan KilbornA Online
        Alan Kilborn @Robin Cruise
        last edited by

        @robin-cruise

        This is just a (by now) simple replace-but-only-between-delimiters problem; see HERE for the templatized solution.

        Robin CruiseR 1 Reply Last reply Reply Quote 1
        • Robin CruiseR Offline
          Robin Cruise @Alan Kilborn
          last edited by

          @alan-kilborn THANKS, it works !!

          Find: (?-i:<p class="mb-40px">|(?!\A)\G)(?s:(?!</p>).)*?\K(?-i:(?<=[^\r\n])\R(?=[^\r\n]))

          Replace by: \x20

          Robin CruiseR 1 Reply Last reply Reply Quote 2
          • Robin CruiseR Offline
            Robin Cruise @Robin Cruise
            last edited by Robin Cruise

            Another solution: (\r\n|\r|\n)

            FIND: (<p class="mb-40px">)+(.)+\K(\r\n|\r|\n)(?=.*<\/p>)

            REPLACE BY: \x20

            The below GENERIC regex formula can be much simple made then @guy038 made in many other of his GENERIC regex formulas:

            (REGION-START)+(.)+\K(FIND REGEX)(?=.*REGION-FINAL)

            Alan KilbornA 1 Reply Last reply Reply Quote 1
            • Alan KilbornA Online
              Alan Kilborn @Robin Cruise
              last edited by

              @robin-cruise said in Regex: How to remove enewline character from a particular html tags?:

              The below GENERIC regex formula can be much simple made then @guy038 made

              Why should you be believed over @guy038 ?

              Alan KilbornA 1 Reply Last reply Reply Quote 0
              • Hellena CrainicuH Offline
                Hellena Crainicu
                last edited by

                @alan-kilborn @guy038

                another alternative of Robin’s generic, a better version, can be:

                (REGION-START)+(.)+\K(FIND REGEX)(?s:(?=.*(REGION-FINAL)))

                1 Reply Last reply Reply Quote 0
                • guy038G Offline
                  guy038
                  last edited by guy038

                  Hello, @rovbin-cruise, @alan-kilborn, @hellena-crainicu and All,

                  Refering to my first blog post about a generic regex, below :

                  https://community.notepad-plus-plus.org/post/75007

                  and as Robin want to search for line-ending chars, we need to use, of course the complete generic regex S/R :

                  SEARCH (?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\K(?-si:FR)

                  REPLACE RR

                  and not the simplified single-line version


                  So :

                  • The FR regex is just \R, as the non-capturing associated group, beginning with (?_si:..., is useless in this case

                  • The RR regex is \x20

                  • The BSR regex may be strictly the string <p class="mb-40px"> but may also be expressed as <p class=".+?">

                  • The ESR regex is, of course, the ending tag </p>, which must never occurs before the next line-ending to replace

                  giving the functional regex S/R :

                  SEARCH (?-si:<p class=".+?">|(?!\A)\G)(?s-i:(?!</p>).)*?\K\R

                  REPLACE \x20

                  Test it against that text :

                  <a href="https://www.w3schools.com/">We strongly suggest
                  to visit the
                  w3schools.com
                  site</a>
                  
                  <p class="mb-40px">Aceasta
                  este o melodie alcatuita
                  din patru masuri:
                  reluata apoi de catre instrumentul solist
                  cu un cintec popular.</p>
                  
                  <p class="Test">A SINGLE line</p>
                  
                  <h1>this is
                  my very
                  first heading
                  </h1>
                  
                  <p class="123-456 789">This is	
                  a quick
                  text to
                  verify if it
                  replaces line-endings
                  by a space char in <p>
                  tags ONLY</p>
                  

                  ONLY the <p class.............<p>, multi-lines or not, should be concerned by the replacement !

                  Of course, these HTML commands do not represent a legal HTML file and are just used to verify the regex S/R !


                  Now, the generic variants, proposed by @Robin-cruise and @hellena-crainicu, with a final look-ahead only, containing the ESR region, will not work, most of a time :-(

                  SEARCH (?-si:BSR|(?!\A)\G).*?\K(?-si:FR)(?=(?s-i:.*?ESR))

                  In our case, the functional regex S/R becomes :

                  SEARCH (?-si:<p class=".+?">|(?!\A)\G).*?\K\R(?=(?s-i:.*?</p>))

                  REPLACE \x20

                  But if you test it against, for instance :

                  
                  <p class="Test">Several
                  consecutive
                  lines</p>
                  
                  <h1>this is
                  my very
                  first heading
                  </h1>
                  
                  <p class="Test">A SINGLE line</p>
                  
                  <h2>this is
                  my second
                  heading
                  </h2>
                  

                  It would concatenate all text till the last </p> of the file, just leaving the last <h2> tag untouched. You could say : But I did add a final question mark in order to get a lazy range of chars before </p> !

                  You’re right ! But remember that the regex engine tries, by all means, to get a solution. So, it matches the CRLF chars, which follow lines</p>, because the regex engine considers that the .*? lazy range of chars begins immediately after the line-ending and continues till right before the third and final </p>, so defining a correct look-ahead assertion !

                  Thus, testing if the ESR region is not reached at any position, till a NEXT FR match, seems the only method which works properly !

                  Best Regards

                  guy038

                  Reminder : Move to the very beginning of text before clicking on the Find Next or Replace All button !

                  1 Reply Last reply Reply Quote 2
                  • Alan KilbornA Online
                    Alan Kilborn @Alan Kilborn
                    last edited by

                    @alan-kilborn said in Regex: How to remove enewline character from a particular html tags?:

                    Why should you be believed over @guy038 ?

                    @guy038 said in Regex: How to remove enewline character from a particular html tags?:

                    Now, the generic variants, proposed by @Robin-cruise and @hellena-crainicu, with a final look-ahead only, containing the ESR region will not work, most of a time :-(


                    @Robin-cruise and @hellena-crainicu :

                    Be careful of posting simplifications.

                    Probably best to leave these things to the “Master”. :-)

                    1 Reply Last reply Reply Quote 2
                    • Hellena CrainicuH Offline
                      Hellena Crainicu
                      last edited by Hellena Crainicu

                      The best solution is this:

                      (?-si:<p class=".+?">|(?!\A)\G)(?s-i:(?!</p>).)*?\K\s+

                      General regex: (?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\KFR

                      1 Reply Last reply Reply Quote 0

                      Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                      Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                      With your input, this post could be even better 💗

                      Register Login
                      • First post
                        Last post
                      The Community of users of the Notepad++ text editor.
                      Powered by NodeBB | Contributors