Community
    • Login

    Remove unwanted Carriage Return

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    27 Posts 6 Posters 8.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Neil SchipperN
      Neil Schipper @Neil Schipper
      last edited by

      A perhaps more elegant find expression is (?<=[^\r\n])\R(?=[^\r\n]) in which case replacement text is nothing or a single space.

      To the regular crowd of regex experts: I was thrown by the fact that these are invalid expressions:

      (?<!\R)\R(?!\R)
      (?<[^\R])\R(?=[^\R])

      My best explanation is that \R can be either 1 or 2 bytes, but a look-behind must be fixed width. Any comment?

      Alexander HelA astrosofistaA 2 Replies Last reply Reply Quote 1
      • Alexander HelA
        Alexander Hel @Neil Schipper
        last edited by

        @neil-schipper said in Remove unwanted Carriage Return:

        A perhaps more elegant find expression is (?<=[^\r\n])\R(?=[^\r\n]) in which case replacement text is nothing or a single space.

        To the regular crowd of regex experts: I was thrown by the fact that these are invalid expressions:

        (?<!\R)\R(?!\R)
        (?<[^\R])\R(?=[^\R])

        My best explanation is that \R can be either 1 or 2 bytes, but a look-behind must be fixed width. Any comment?

        Hi, thank you for your reply, I tried this expression (?<=[^\r\n])\R(?=[^\r\n]) and it worked flawlessly.

        Thank you so much

        1 Reply Last reply Reply Quote 1
        • guy038G
          guy038
          last edited by guy038

          Hello, @alexander-hel, @neil-schipper, and All,

          Yes, Neil, you’re right : just because of needed fixed length in look-behinds, the form (?<!\R)\R(?!\R) is invalid as \R may match one or two consecutive chars

          As for your second attempt, the parts [^\R] just match any character

          • But the uppercase R if the Match case option is set

          • But the letters R and r if the Match case option is not set


          Now a solution is to use a fixed length in the look-behind part :

          SEARCH (?<!\n|\r)\R(?!\R)

          Note that, if the last line ends with a line-break, it’s also matched. Indeed, the part (?!\R) is followed with nothing which is obviously different from \R )

          Best Regards,

          guy038

          Neil SchipperN 2 Replies Last reply Reply Quote 2
          • guy038G
            guy038
            last edited by

            Hi, @neil-schipper and All,

            Oh…, I didn’t see that you already gave the right solution to the OP :

            (?<=[^\r\n])\R(?=[^\r\n])

            Which is equivalent to my solution :

            (?<!\n|\r)\R(?!\R)

            BR

            guy038

            1 Reply Last reply Reply Quote 1
            • Neil SchipperN
              Neil Schipper @guy038
              last edited by

              @guy038 Thanks, Guy.

              I’m noticing now why the 2nd regex I presented as invalid, (?<[^\R])\R(?=[^\R]), is invalid: bad syntax due to missing = after <.

              After that fix, it’s merely wrong (since the fancy \R construct is not decoded when appearing inside [] as you pointed out).

              Best,
              Neil

              1 Reply Last reply Reply Quote 0
              • astrosofistaA
                astrosofista @Neil Schipper
                last edited by

                @neil-schipper said in Remove unwanted Carriage Return:

                (?<!\R)\R(?!\R)
                (?<[^\R])\R(?=[^\R])
                My best explanation is that \R can be either 1 or 2 bytes, but a look-behind must be fixed width. Any comment?

                Since a negative look-behind must be fixed width, the way I use to get a valid expression is to replace \R by a \v or vertical tab, as follows:

                (?<!\v)\R(?!\R)

                Neil SchipperN 1 Reply Last reply Reply Quote 2
                • Neil SchipperN
                  Neil Schipper @astrosofista
                  last edited by

                  @astrosofista said:

                  replace \R by a \v

                  Confirmed, and good to know, thanks. It’s strange that both constructs encode the 1 or 2 byte newline sequences, but only \v is valid.

                  a negative look-behind must be fixed width

                  Both positive and negative look-behinds have this limitation (perhaps what you meant to say).

                  PeterJonesP astrosofistaA 2 Replies Last reply Reply Quote 0
                  • PeterJonesP
                    PeterJones @Neil Schipper
                    last edited by

                    @neil-schipper said in Remove unwanted Carriage Return:

                    Both positive and negative look-behinds have this limitation (perhaps what you meant to say).

                    Nope.

                    https://www.boost.org/doc/libs/1_78_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html#boost_regex.syntax.perl_syntax.lookahead

                    vs

                    https://www.boost.org/doc/libs/1_78_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html#boost_regex.syntax.perl_syntax.lookbehind

                    Notice that only the lookbehind has the pattern must be of fixed length caveat; the lookahead can be variable width.

                    And it’s very easy to test:
                    lookbehind
                    c87c5b67-366e-4658-b3f6-cb93886d04c5-image.png

                    lookahead
                    701eabc7-af01-412e-a0ca-23a0afa9b10b-image.png

                    Neil SchipperN 1 Reply Last reply Reply Quote 0
                    • Neil SchipperN
                      Neil Schipper @PeterJones
                      last edited by

                      @peterjones My comment pertains to the two kinds of look-behind, positive (?<= and negative (?<! (and I’ve tested both), and makes no mention of the two kinds of look-aheads.

                      PeterJonesP 1 Reply Last reply Reply Quote 1
                      • Alan KilbornA
                        Alan Kilborn
                        last edited by Alan Kilborn

                        replace \R by a \v

                        This doesn’t make sense in the desired usage above.
                        A \R can be one or two characters, thus not fixed length.
                        A \v is always only one character, when it matches.

                        Neil SchipperN 1 Reply Last reply Reply Quote 0
                        • Neil SchipperN
                          Neil Schipper @Alan Kilborn
                          last edited by

                          @alan-kilborn said in Remove unwanted Carriage Return:

                          A \v is always only one character, when it matches.

                          Good point. But @astrosofista’s trick does work with conventional UTF-8 \r\n line endings; it’s a more tolerant way of simply specifying \n, and would also handle other non-standard line ending formats.

                          Your comment reminds me that it’s risky to get too used to throwing \v around: if used in a matched text expression which will be replaced, it’s easy to inadvertently destroy a pristine file’s uniform line endings.

                          Alan KilbornA 1 Reply Last reply Reply Quote 0
                          • PeterJonesP
                            PeterJones @Neil Schipper
                            last edited by

                            @neil-schipper ,

                            Sorry, I misread. Time to stop trying to think for the evening, apparently

                            1 Reply Last reply Reply Quote 1
                            • Alan KilbornA
                              Alan Kilborn @Neil Schipper
                              last edited by

                              @neil-schipper said in Remove unwanted Carriage Return:

                              @astrosofista’s \v trick does work with conventional UTF-8 \r\n line endings; it’s a more tolerant way of simply specifying \n, and would also handle other non-standard line ending formats.

                              I don’t think it has value. If you need to match \r\n then \v\v will match it, but it will also match other things (that maybe aren’t wanted), so…no real point in it.

                              1 Reply Last reply Reply Quote 0
                              • astrosofistaA
                                astrosofista @Neil Schipper
                                last edited by

                                @neil-schipper said in Remove unwanted Carriage Return:

                                Both positive and negative look-behinds have this limitation (perhaps what you meant to say).

                                Yes, that’s what I meant. My apologies for the possible misunderstanding.

                                Anyway, although both lookbehinds share such limitation, they differ because in the case of the positive lookbehind we have at our disposal the alternative of the operator \K, but no operator for the negative one.

                                It would be nice if this issue could be fixed sometime.

                                Alan KilbornA 1 Reply Last reply Reply Quote 0
                                • Alan KilbornA
                                  Alan Kilborn @astrosofista
                                  last edited by

                                  @astrosofista said in Remove unwanted Carriage Return:

                                  It would be nice if this issue could be fixed sometime.

                                  It’s just a coincidence that \K can be used as a variable length positive lookbehind.
                                  Because there is no equivalent for a negative lookbehind, doesn’t mean there’s an issue that could be “fixed”.

                                  PeterJonesP 1 Reply Last reply Reply Quote 0
                                  • PeterJonesP
                                    PeterJones @Alan Kilborn
                                    last edited by

                                    @alan-kilborn said in Remove unwanted Carriage Return:

                                    @astrosofista said in Remove unwanted Carriage Return:

                                    It would be nice if this issue could be fixed sometime.

                                    Because there is no equivalent for a negative lookbehind, doesn’t mean there’s an issue that could be “fixed”.

                                    The lookbehind fixed-length restriction is caused by the Boost::regex library, not by anything Notepad++ does. So the issue would have to be fixed there.

                                    To find more of the history of Boost::regex, and how long they’ve known about that restriction, I went to https://www.boost.org/doc/libs/ (I kept cutting stuff out of the 1.78 URL until I found a page that listed what other versions were available), and looked at old versions until I found the earliest Boost::regex that I noticed lookbehind syntax documented: v1.33.1 from 2004 – where they already note that restriction. If they’ve known about that limitation since 2004 and never gotten rid of that restriction, there’s probably a good technical reason that it’s too hard to implement, and it’s not likely to change anytime soon.

                                    Alan KilbornA 1 Reply Last reply Reply Quote 2
                                    • Alan KilbornA
                                      Alan Kilborn @PeterJones
                                      last edited by

                                      @peterjones said in Remove unwanted Carriage Return:

                                      there’s probably a good technical reason that it’s too hard to implement, and it’s not likely to change anytime soon.

                                      I think the limitation probably is rooted in runtime complexity for the engine that would provide a poor user experience (way too long for it to examine every possible match, potential for engine catastrophic overflow, etc.). Just my hunch.

                                      Neil SchipperN 1 Reply Last reply Reply Quote 2
                                      • Neil SchipperN
                                        Neil Schipper @Alan Kilborn
                                        last edited by

                                        @alan-kilborn @astrosofista @PeterJones

                                        Open ended variable length negative look-behinds using subexpressions like hello.* would have huge performance implications.

                                        Upper-bounded variable length expressions like dog|puppy or \d{1,500} would be “easy-peasy” (ie, computers doing exactly what computer are good at doing), and extremely useful.

                                        PeterJonesP 2 Replies Last reply Reply Quote 2
                                        • PeterJonesP
                                          PeterJones @Neil Schipper
                                          last edited by

                                          @neil-schipper said in Remove unwanted Carriage Return:

                                          would be “easy-peasy”

                                          I am sure if you can get that PR written and submitted to Boost::regex, they would be quite happy for your implementation. ;-)

                                          1 Reply Last reply Reply Quote 2
                                          • PeterJonesP
                                            PeterJones @Neil Schipper
                                            last edited by PeterJones

                                            @neil-schipper said in Remove unwanted Carriage Return:

                                            dog|puppy

                                            Also, since Boost::regex was derived as a PCRE, with roots in Perl, it still has the TIMTOWTDI philosophy:

                                            • (?<=dog|puppy)chow can be represented as ((?<=dog)|(?<=puppy))chow
                                              b4c26801-84d9-4ccf-8b01-93d248c384a8-image.png
                                            • (?<!dog|puppy)chow can be represented as ((?<!dog)(?<!puppy))chow f45582e2-354c-461c-b085-596ba7b700a2-image.png
                                              • note that because of De Morgan’s Laws, NOT(A OR B) becomes NOT(A) AND NOT(B)`

                                            \d{1,500} is admittedly harder to come up with an equivalent lookbehind that will work. And by harder, I mean, I couldn’t in the last 5 minutes. (Specifically, we obviously don’t want to construct 500 alternatives manually, or fill up that space in the regex. That would have been the “easy” alternative, but not practical.)

                                            1 Reply Last reply Reply Quote 2
                                            • First post
                                              Last post
                                            The Community of users of the Notepad++ text editor.
                                            Powered by NodeBB | Contributors