Community
    • Login

    REGEX: Combine the two lines on a single line

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    9 Posts 4 Posters 803 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Hellena CrainicuH
      Hellena Crainicu
      last edited by

      I try to combine the two lines on a single line, but in such a way as to keep the tags.

      <meta property="og:description" content="Комедия по пьесе А.Н.Островского.

      «На бойком месте» не соскучишься. Тут тебе и кутёж, и грабёж, и коварство, и л"/>

      My regex is good, but I need a short version for the same tag:

      FIND: <meta property="og:description" content="(.*?)\s*(\n)(.*?)"/>
      REPLACE BY: <meta property="og:description" content="\1\3"/>

      mkupperM Terry RT 2 Replies Last reply Reply Quote 0
      • mkupperM
        mkupper @Hellena Crainicu
        last edited by

        @Hellena-Crainicu said in REGEX: Combine the two lines on a single line:

        <meta property=“og:description” content="Комедия по пьесе А.Н.Островского.

        «На бойком месте» не соскучишься. Тут тебе и кутёж, и грабёж, и коварство, и л"/>

        Why not do:
        Search: (\h+content=")([^\R"]*)\R*([^\R"]*)
        Replace: \1\2\3

        Try it and think.

        Hellena CrainicuH 1 Reply Last reply Reply Quote 1
        • Terry RT
          Terry R @Hellena Crainicu
          last edited by

          @Hellena-Crainicu said in REGEX: Combine the two lines on a single line:

          I try to combine the two lines on a single line, but in such a way as to keep the tags.

          Another idea is:
          Find What:<((?!/>)[^\r\n])+\K\R
          Replace With: this field empty

          So it will look for the start of a tag (<), if it reaches the end of the line without a closing tag (/>) that would mean a continuation line. Thus it removes the CR and/or LF. If the next line needs a space between it and the preceding line, add that space character to the Replace With field.

          (?!/>)[^\r\n] this means the next 2 characters cannot be a /> (closing tag), if not then capture another character (so long as they aren’t end of line characters). The + following will repeat this process. That way if a / or a > occurs by itself in the text it will not stop the regex.

          As it uses the \K you will need to click on “Replace All” for the regex to work.

          Terry

          Hellena CrainicuH 1 Reply Last reply Reply Quote 1
          • Hellena CrainicuH
            Hellena Crainicu @mkupper
            last edited by

            @mkupper said in REGEX: Combine the two lines on a single line:

            Why not do:
            Search: (\h+content=")([^\R"]*)\R*([^\R"]*)
            Replace: \1\2\3

            Try it and think.

            The replace is not working, doesn’t do nothing…

            1 Reply Last reply Reply Quote 0
            • Hellena CrainicuH
              Hellena Crainicu @Terry R
              last edited by

              @Terry-R said in REGEX: Combine the two lines on a single line:

              <((?!/>)[^\r\n])+\K\R

              yout regex is almost good, but the replacement will delete the line between 1 and 3 lines. But doesn’t combine the content of both lines.

              Mark OlsonM Terry RT 2 Replies Last reply Reply Quote 0
              • Mark OlsonM
                Mark Olson @Hellena Crainicu
                last edited by Mark Olson

                @Hellena-Crainicu
                Assuming your goal is to delete all newlines inside of the content attribute of a meta tag, I believe I have a solution for you:
                Replace (?s-i)(?:<meta[^<>]*content\s*=\s*"|(?!\A)\G)[^"]*?\K\R with nothing.

                This is based on guy038’s now-classic “replace-all-instances-of-X-between-Y-and-Z” regex, and works as follows:

                1. start off by looking for the content attribute of a meta tag: <meta[^<>]*content\s*=\s*"
                2. Begin our match there OR wherever the last match ended UNLESS we wrapped around to the beginning of the file: (?!\A)\G
                3. Keep searching until we find the close quote character of the content attribute, and then forget everything we matched: [^"]*?\K
                4. Find a newline character: \R
                5. and replace the newline character you found with nothing.

                It will convert

                <meta property=“og:description” content="foo
                bar
                baz
                
                more foo"  />
                <notmeta content="ignore this
                tag"/>
                <meta property=“og:description” content="also foo
                bar
                bar"/>    
                
                <meta content="keep fooing forever!
                foo
                foo" property=“og:description” />
                

                into the following:

                <meta property=“og:description” content="foobarbazmore foo"  />
                <notmeta content="ignore this
                tag"/>
                <meta property=“og:description” content="also foobarbar"/>    
                
                <meta content="keep fooing forever!foofoo" property=“og:description” />
                

                Note that all the newlines that aren’t in the content attribute of a meta tag have been preserved.
                Just for fun, I tried this 8 thousand lines of that repeated, and it was reasonably performant.

                Hellena CrainicuH 1 Reply Last reply Reply Quote 1
                • Terry RT
                  Terry R @Hellena Crainicu
                  last edited by

                  @Hellena-Crainicu said in REGEX: Combine the two lines on a single line:

                  But doesn’t combine the content of both lines.

                  I see what the problem was. I was copying your example, and because your title was about combining the 2 lines, I was absentmindedly removing the (very important) empty line between. Then my testing was with just 2 lines.

                  So my regex just needs an additional \R at the end. So it becomes
                  <((?!/>)[^\r\n])+\K\R\R

                  Terry

                  Hellena CrainicuH 1 Reply Last reply Reply Quote 1
                  • Hellena CrainicuH
                    Hellena Crainicu @Terry R
                    last edited by

                    @Terry-R said in REGEX: Combine the two lines on a single line:

                    FIND: <((?!/>)[^\r\n])+\K\R\R
                    REPLACE: (leave empty)

                    thanks, works beautiful

                    1 Reply Last reply Reply Quote 0
                    • Hellena CrainicuH
                      Hellena Crainicu @Mark Olson
                      last edited by

                      @Mark-Olson said in REGEX: Combine the two lines on a single line:

                      FIND: (?s-i)(?:<meta[^<>]*content\s*=\s*"|(?!\A)\G)[^"]*?\K\R
                      REPLACE BY: (leave empty)

                      super. thanks

                      1 Reply Last reply Reply Quote 0
                      • First post
                        Last post
                      The Community of users of the Notepad++ text editor.
                      Powered by NodeBB | Contributors