Community
    • Login

    Regex: Delete empty lines inside an html tag, after .dot

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    8 Posts 4 Posters 469 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Neculai I. FantanaruN
      Neculai I. Fantanaru
      last edited by

      In the example below, I have many empty lines between first sentence and the second. So, I want to use a regex, so as to connect the two sentences, after the .dot

      <meta name="description" content=" I go home.
      
      
      
      But I cannot go to work "/>
      

      My regex is almost good, deletes empy lines, except that does not connect the sentences.

      FIND: (?-si:<meta name="description" content="|(?!\A)\G)(?s-i:(?!"/>).)*?\K^\s+

      REPLACE BY: (leave empty)

      Can anyone help me with a better solution?

      Mark OlsonM 1 Reply Last reply Reply Quote 0
      • Mark OlsonM
        Mark Olson @Neculai I. Fantanaru
        last edited by

        @Neculai-I-Fantanaru
        If you want to remove all newlines (not just empty lines) within the content attributes, this should work:
        (?-i)(?:meta name="description" content="|(?!\A)\G)(?:(?!"/>).)*?\K\R+.

        The \R metacharacter matches all newlines, including \r, \n, and \r\n.

        My understanding (I could be wrong) is that newlines in general are disallowed inside XML (and by extension, HTML) attribute names.

        1 Reply Last reply Reply Quote 0
        • Mark OlsonM
          Mark Olson
          last edited by

          After some more thought, I came up with one that eliminates only empty lines and the last newline before the close quote if that’s really what you want:
          replace (?s-i)(?:meta name="description" content="|(?!\A)\G)(?:(?!"/>).)*?\K\R(?=$|[^"\r\n]*?") with nothing.

          This will convert

          <meta name="description" content=" I go home.
          
          should stay on own line.
          
          but this will collapse.
          
          But I cannot go to work "/>
          <meta name="description" content=" foo
          "/>
          <meta name="description" content=" I go home.
          
          
          
          But I cannot go to work "/>
          

          into

          <meta name="description" content=" I go home.
          should stay on own line.
          but this will collapse.But I cannot go to work "/>
          <meta name="description" content=" foo"/>
          <meta name="description" content=" I go home.But I cannot go to work "/>
          
          Neculai I. FantanaruN 1 Reply Last reply Reply Quote 0
          • Neculai I. FantanaruN
            Neculai I. Fantanaru @Mark Olson
            last edited by Neculai I. Fantanaru

            @Mark-Olson said in Regex: Delete empty lines inside an html tag, after .dot:

            (?s-i)(?:meta name=“description” content=“|(?!\A)\G)(?:(?!”/>).)?\K\R(?=$|[^"\r\n]?")

            not quite. Because your regex doesn’t put all lines on the same line. After replacement, must become like this:

            <meta name="description" content=" I go home. should stay on own line. but this will collapse.But I cannot go to work "/>

            Mark OlsonM 1 Reply Last reply Reply Quote 0
            • namx3249N
              namx3249
              last edited by namx3249

              read my thread here: https://community.notepad-plus-plus.org/topic/24369/regex-help-with-reverse-line/7

              thanks to PeterJones i think the second part of this topic can help you

              also you can delete all blank empty lines from Edit - Line operations - Remove Empty Lines
              then apply PeterJones regex to put all text in single line

              1 Reply Last reply Reply Quote 0
              • Neculai I. FantanaruN
                Neculai I. Fantanaru
                last edited by

                I find a better solution, I update my regex:

                FIND: (?-si:<meta name="description" content="|(?!\A)\G)(?s-i:(?!"/>).)*?\K\s+\s+

                REPLACE BY: \x20

                So, the generic will be:

                (?-si:FIRST-PART|(?!\A)\G)(?s-i:(?!SECOND-PART).)*?\KREGEX-REPLACE

                1 Reply Last reply Reply Quote 0
                • Mark OlsonM
                  Mark Olson @Neculai I. Fantanaru
                  last edited by

                  @Neculai-I-Fantanaru
                  Yes, my regex did that. If you looked at my data, you would see that your initial example was part of it and my regex did that.

                  I’m glad you found a solution that works. However, I would note that the \s+\s+ in your regex should be replaced with \s+ because the second \s+ contributes nothing.

                  Rufi MaR 1 Reply Last reply Reply Quote 0
                  • Rufi MaR
                    Rufi Ma @Mark Olson
                    last edited by

                    @Mark-Olson said in Regex: Delete empty lines inside an html tag, after .dot:

                    @Neculai-I-Fantanaru
                    Yes, my regex did that. If you looked at my data, you would see that your initial example was part of it and my regex did that.

                    I’m glad you found a solution that works. However, I would note that the \s+\s+ in your regex should be replaced with \s+ because the second \s+ contributes nothing.

                    Thanks it helps a lot.

                    1 Reply Last reply Reply Quote 0
                    • First post
                      Last post
                    The Community of users of the Notepad++ text editor.
                    Powered by NodeBB | Contributors