Community
    • Login

    Bug when a multi-lines regex is used in the 'Search', 'Replace' or 'Mark' dialog

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    21 Posts 3 Posters 2.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Alan KilbornA
      Alan Kilborn
      last edited by Alan Kilborn

      Instead of Excel, why not use a bit of PythonScript to generate the “ruler” lines?:

              accum = ''
              for j in range(1020, 1030 + 1):
                  desired_len = j
                  des_len_as_str = str(desired_len)
                  s = des_len_as_str
                  tens_count = 0
                  while True:
                      if (len(s) + 1) % 10 == 0:
                          if (tens_count + 2) * 10 <= desired_len:
                              s += str((tens_count + 1) * 10)
                              tens_count += 1
                      if len(s) >= desired_len: break
                      s += '_'
                  s = s[:-len(des_len_as_str)] + des_len_as_str
                  accum += s + '\r\n'
              editor.copyText(accum)
      

      The example above generates ruler lines of length 1020 through 1030, inclusive. The ruler data ends up in the clipboard after the script runs.

      Note that mine might be different from the earlier ruler lines discussed – I chose that the intermediate numbers start in their indicated column, e.g. after you paste the output of the script into a new tab, if you put the caret just to the left of the 8 in 890, the status bar will indicate Col: 890.

      To select 890 characters from that same example line, put the caret between the 8 and the 9 and then press Shift+Home.

      Here’s some output from the script:

      1020_____10________20________30________40________50________60________70________80________90________100_______110_______120_______130_______140_______150_______160_______170_______180_______190_______200_______210_______220_______230_______240_______250_______260_______270_______280_______290_______300_______310_______320_______330_______340_______350_______360_______370_______380_______390_______400_______410_______420_______430_______440_______450_______460_______470_______480_______490_______500_______510_______520_______530_______540_______550_______560_______570_______580_______590_______600_______610_______620_______630_______640_______650_______660_______670_______680_______690_______700_______710_______720_______730_______740_______750_______760_______770_______780_______790_______800_______810_______820_______830_______840_______850_______860_______870_______880_______890_______900_______910_______920_______930_______940_______950_______960_______970_______980_______990_______1000______1010___1020
      1021_____10________20________30________40________50________60________70________80________90________100_______110_______120_______130_______140_______150_______160_______170_______180_______190_______200_______210_______220_______230_______240_______250_______260_______270_______280_______290_______300_______310_______320_______330_______340_______350_______360_______370_______380_______390_______400_______410_______420_______430_______440_______450_______460_______470_______480_______490_______500_______510_______520_______530_______540_______550_______560_______570_______580_______590_______600_______610_______620_______630_______640_______650_______660_______670_______680_______690_______700_______710_______720_______730_______740_______750_______760_______770_______780_______790_______800_______810_______820_______830_______840_______850_______860_______870_______880_______890_______900_______910_______920_______930_______940_______950_______960_______970_______980_______990_______1000______1010____1021
      1022_____10________20________30________40________50________60________70________80________90________100_______110_______120_______130_______140_______150_______160_______170_______180_______190_______200_______210_______220_______230_______240_______250_______260_______270_______280_______290_______300_______310_______320_______330_______340_______350_______360_______370_______380_______390_______400_______410_______420_______430_______440_______450_______460_______470_______480_______490_______500_______510_______520_______530_______540_______550_______560_______570_______580_______590_______600_______610_______620_______630_______640_______650_______660_______670_______680_______690_______700_______710_______720_______730_______740_______750_______760_______770_______780_______790_______800_______810_______820_______830_______840_______850_______860_______870_______880_______890_______900_______910_______920_______930_______940_______950_______960_______970_______980_______990_______1000______1010_____1022
      1023_____10________20________30________40________50________60________70________80________90________100_______110_______120_______130_______140_______150_______160_______170_______180_______190_______200_______210_______220_______230_______240_______250_______260_______270_______280_______290_______300_______310_______320_______330_______340_______350_______360_______370_______380_______390_______400_______410_______420_______430_______440_______450_______460_______470_______480_______490_______500_______510_______520_______530_______540_______550_______560_______570_______580_______590_______600_______610_______620_______630_______640_______650_______660_______670_______680_______690_______700_______710_______720_______730_______740_______750_______760_______770_______780_______790_______800_______810_______820_______830_______840_______850_______860_______870_______880_______890_______900_______910_______920_______930_______940_______950_______960_______970_______980_______990_______1000______1010______1023
      1024_____10________20________30________40________50________60________70________80________90________100_______110_______120_______130_______140_______150_______160_______170_______180_______190_______200_______210_______220_______230_______240_______250_______260_______270_______280_______290_______300_______310_______320_______330_______340_______350_______360_______370_______380_______390_______400_______410_______420_______430_______440_______450_______460_______470_______480_______490_______500_______510_______520_______530_______540_______550_______560_______570_______580_______590_______600_______610_______620_______630_______640_______650_______660_______670_______680_______690_______700_______710_______720_______730_______740_______750_______760_______770_______780_______790_______800_______810_______820_______830_______840_______850_______860_______870_______880_______890_______900_______910_______920_______930_______940_______950_______960_______970_______980_______990_______1000______1010_______1024
      1025_____10________20________30________40________50________60________70________80________90________100_______110_______120_______130_______140_______150_______160_______170_______180_______190_______200_______210_______220_______230_______240_______250_______260_______270_______280_______290_______300_______310_______320_______330_______340_______350_______360_______370_______380_______390_______400_______410_______420_______430_______440_______450_______460_______470_______480_______490_______500_______510_______520_______530_______540_______550_______560_______570_______580_______590_______600_______610_______620_______630_______640_______650_______660_______670_______680_______690_______700_______710_______720_______730_______740_______750_______760_______770_______780_______790_______800_______810_______820_______830_______840_______850_______860_______870_______880_______890_______900_______910_______920_______930_______940_______950_______960_______970_______980_______990_______1000______1010________1025
      1026_____10________20________30________40________50________60________70________80________90________100_______110_______120_______130_______140_______150_______160_______170_______180_______190_______200_______210_______220_______230_______240_______250_______260_______270_______280_______290_______300_______310_______320_______330_______340_______350_______360_______370_______380_______390_______400_______410_______420_______430_______440_______450_______460_______470_______480_______490_______500_______510_______520_______530_______540_______550_______560_______570_______580_______590_______600_______610_______620_______630_______640_______650_______660_______670_______680_______690_______700_______710_______720_______730_______740_______750_______760_______770_______780_______790_______800_______810_______820_______830_______840_______850_______860_______870_______880_______890_______900_______910_______920_______930_______940_______950_______960_______970_______980_______990_______1000______1010_________1026
      1027_____10________20________30________40________50________60________70________80________90________100_______110_______120_______130_______140_______150_______160_______170_______180_______190_______200_______210_______220_______230_______240_______250_______260_______270_______280_______290_______300_______310_______320_______330_______340_______350_______360_______370_______380_______390_______400_______410_______420_______430_______440_______450_______460_______470_______480_______490_______500_______510_______520_______530_______540_______550_______560_______570_______580_______590_______600_______610_______620_______630_______640_______650_______660_______670_______680_______690_______700_______710_______720_______730_______740_______750_______760_______770_______780_______790_______800_______810_______820_______830_______840_______850_______860_______870_______880_______890_______900_______910_______920_______930_______940_______950_______960_______970_______980_______990_______1000______1010__________1027
      1028_____10________20________30________40________50________60________70________80________90________100_______110_______120_______130_______140_______150_______160_______170_______180_______190_______200_______210_______220_______230_______240_______250_______260_______270_______280_______290_______300_______310_______320_______330_______340_______350_______360_______370_______380_______390_______400_______410_______420_______430_______440_______450_______460_______470_______480_______490_______500_______510_______520_______530_______540_______550_______560_______570_______580_______590_______600_______610_______620_______630_______640_______650_______660_______670_______680_______690_______700_______710_______720_______730_______740_______750_______760_______770_______780_______790_______800_______810_______820_______830_______840_______850_______860_______870_______880_______890_______900_______910_______920_______930_______940_______950_______960_______970_______980_______990_______1000______1010___________1028
      1029_____10________20________30________40________50________60________70________80________90________100_______110_______120_______130_______140_______150_______160_______170_______180_______190_______200_______210_______220_______230_______240_______250_______260_______270_______280_______290_______300_______310_______320_______330_______340_______350_______360_______370_______380_______390_______400_______410_______420_______430_______440_______450_______460_______470_______480_______490_______500_______510_______520_______530_______540_______550_______560_______570_______580_______590_______600_______610_______620_______630_______640_______650_______660_______670_______680_______690_______700_______710_______720_______730_______740_______750_______760_______770_______780_______790_______800_______810_______820_______830_______840_______850_______860_______870_______880_______890_______900_______910_______920_______930_______940_______950_______960_______970_______980_______990_______1000______1010____________1029
      1030_____10________20________30________40________50________60________70________80________90________100_______110_______120_______130_______140_______150_______160_______170_______180_______190_______200_______210_______220_______230_______240_______250_______260_______270_______280_______290_______300_______310_______320_______330_______340_______350_______360_______370_______380_______390_______400_______410_______420_______430_______440_______450_______460_______470_______480_______490_______500_______510_______520_______530_______540_______550_______560_______570_______580_______590_______600_______610_______620_______630_______640_______650_______660_______670_______680_______690_______700_______710_______720_______730_______740_______750_______760_______770_______780_______790_______800_______810_______820_______830_______840_______850_______860_______870_______880_______890_______900_______910_______920_______930_______940_______950_______960_______970_______980_______990_______1000______1010______1020___1030
      
      1 Reply Last reply Reply Quote 2
      • guy038G
        guy038
        last edited by guy038

        Hello, @mkupper, @alan-kilborn and All,

        I did some tests and I succeeded to find a work-around in order to use multi-lines regexes of more than 1,024 characters, as long as, of course, the total amount of chars does not exceed 2,047 characters. All the tests were done with the last N++ release v8.8.1.

        Here my method, to be followed rigorously.


        In a new tab, paste, successively, the two MULTI-lines regexes, below :

        (?x-i)                  # Search SENSIBLE to CASE
        (?<=\x20)               # Preceded with SPACE
        (?:                     # Start NON-CAPTURING group
        0[0-2][0-9A-F][0-9A-F]  |
        03[7-9A-F][0-9A-F]      |
        04[0-9A-F][0-9A-F]      |
        05[0-8][0-9A-F]         |
        10[A-F][0-9A-F]         |
        1C[8-B][0-9A-F]         |
        1D[0-9AB][0-9A-F]       |
        1[EF][0-9A-F][0-9A-F]   |
        20[7-C][0-9A-F]         |
        21[0-8][0-9A-F]         |
        24[6-9A-F][0-9A-F]      |
        25[A-F][0-9A-F]         |
        27[0-B][0-9A-F]         |
        2C[6-9A-F][0-9A-F]      |
        2D[012][0-9A-F]         |
        A6[4-9][0-9A-F]         |
        A7[2-9A-F][0-9A-F]      |
        AB[3-6][0-9A-F]         |
        FB[01][0-9A-F]          |
        FF[0-5E][0-9A-F]        |
        102[EF][0-9A-F]         |
        105[0-2][0-9A-F]        |
        107[89AB][0-9A-F]       |
        1CC[DEF][0-9A-F]        |
        1D[4-7][0-9A-F][0-9A-F] |
        1DF[0-9A-F][0-9A-F]     |
        1E0[3-8][0-9A-F]        |
        1F1[0-9A-F][0-9A-F]     |
        1FB[0-9A-F][0-9A-F]
        )                     # End NON-CAPTURING group
        (?=\x20)              # Followed with SPACE
        

        And :

        (?x-si)
          (?:
            ^ ~~~ [\h\l]* \R (?s:.+?) ^ ~~~ \h* \R |
            ^ ``` [\h\l]* \R (?s:.+?) ^ ``` \h* \R |
        
            ^ \h* (?: - \h* )? (?i: FIND | SEARCH | REPLACE ) \x20 .+
          |
            (?<= \s )
            (?:
        #        Case *......* / Case **......** / Case `......`
        
             \*   [^`* \t\r\n] (?: [^*\r\n]* [^`* \t\r\n] )? \*   |
             \*\* [^`* \t\r\n] (?: [^*\r\n]* [^`* \t\r\n] )? \*\* |
             `    [^` \t\r\n]  (?: [^\r\n]*  [^` \t\r\n] )? `     |
        
        #        Case **`......`**
        
             \*\*                                          ` [^` \t\r\n] (?: [^`\r\n]* [^` \t\r\n] )? `                                        \*\* |
             \*\*                                          ` [^`\r\n]    (?: [^`\r\n]* [^`\r\n]    )? ` (?: [^*`\r\n] [^`\r\n]* )? [^` \t\r\n] \*\* |
             \*\* [^` \t\r\n] (?: [^`\r\n]*   [^*`\r\n] )? ` [^`\r\n]    (?: [^`\r\n]* [^`\r\n]    )? `                                        \*\* |
             \*\* [^` \t\r\n] (?: [^`\r\n]*   [^*`\r\n] )? ` [^`\r\n]    (?: [^`\r\n]* [^`\r\n]    )? ` (?: [^*`\r\n] [^`\r\n]* )? [^` \t\r\n] \*\* |
        
        #        Case *`......`*
        
             \*                                            ` [^` \t\r\n] (?: [^`\r\n]* [^` \t\r\n] )? `                                          \* |
             \*                                            ` [^`\r\n]    (?: [^`\r\n]* [^`\r\n]    )? ` (?: [^*`\r\n] [^*`\r\n]* )? [^` \t\r\n]  \* |
             \*   [^` \t\r\n] (?: [^*`\r\n]* [^`*\r\n] )?  ` [^`\r\n]    (?: [^`\r\n]* [^`\r\n]    )? `                                          \* |
             \*   [^` \t\r\n] (?: [^*`\r\n]* [^`*\r\n] )?  ` [^`\r\n]    (?: [^`\r\n]* [^`\r\n]    )? ` (?: [^*`\r\n] [^*`\r\n]* )? [^` \t\r\n]  \*
            ){1}+
            (?= [\s,;.-] | 's | \z )
          )
          (*SKIP) (*F)  # CORRECT cases are IGNORED
        |
          [*`]+         .+?  [*`]+  |
          [*`]+  \x20*  .+?  \x20+
        

        Note that the first regex contains 1,022 characters and the second contains 1,802 chars, so one regex are less than 1,024 characters and the other more than 1,024.

        Add, in this new tab, the two lines below ( the first line should be matched by the first regex and the second line should be matched by the second regex ).

         2C63 | LATIN CAPITAL LETTER P WITH STROKE | Ᵽ
        
        *AB D *
        
        • To end, save this new tab as Test_RE.txt

        First manipulation :

        • Switch to the Test_RE.txt file

        • Select the contents of the first regex ( 1,022 chars )

        • Open the Find dialog ( Ctrl + F )

        => As you can see, the end of this multi-lines regex is visible and you should note the part # Followed with SPACE

        • Uncheck all the box options if any ( note that the In selection option was already not checked )

        • Note that the Wrap around option will stay uncheched during all the tests

        • Select the Regular expression search mode

        • Click on the Find Next button

        => As expected, it matches the 2C63 string

        • Close the Find dialog

        Second manipulation :

        • Switch to the Test_RE.txt file

        • This time, select the contents of the second regex ( 1,802 chars )

        • Open the Find dialog ( Ctrl + F )

        => You can observe two things :

        • The In selection option is checked ( Logical, as the amount of chars is greater than 1,024 )

        • Surprisingly, the regex, shown in the search field, is STILL the first regex and not the second expected multi-lines regex !

        • Hit the Find Next button : as expected, it returns, again, the 2C63 string ( Note that the In selection button is NOT concerned when you hit the Find Next button )

        • Close the Find dialog

        At this point, you can, either :

        • Re-open the Find dialog

        • Close and re-load the Test_RE.txt file

        • Close and re-start Notepad++

        Whatever you decided, if you re-open the Find dialog, the first regex, ending with the string # Followed with SPACE, is STILL present in the search field, although we previously select the second regex ???!!! Why ?


        Third manipulation :

        • Switch to the Test_RE.txt file

        • Select, again, the contents of the second regex ( 1,802 chars )

        • Before opening the Find dialog, just hit the Ctrl + F3 shortcut ( This is the WORK-AROUND ! )

        • Open the Find dialog

        • This time, you can verify that the search field contains the correct regex : the second one

        • Hit the Find Next button => This time, it correctly matches the *AB D * string

        Note that sometimes, I needed to cancel the selection, right before opening the Find dialog, in order to get this match !


        It’s important to add that, modifying in Preferences... > Searching > When Fin Dialog is Invoked the value 1,024 to the value 0, does NOT change the global behavior

        As a conclusion, I would say that all that logic seems rather unclear. Can anyone reproduce these steps and see the problem ?

        Best Regards,

        guy038

        P.S. :

        Note that without this work-around, I should have used the v8.4.9 release, which is the last version before the auto-checking of the In selection option

        Alan KilbornA 1 Reply Last reply Reply Quote 2
        • Alan KilbornA
          Alan Kilborn @guy038
          last edited by Alan Kilborn

          @guy038 said:

          Second manipulation :
          …
          …select the contents of the second regex ( 1,802 chars )
          …
          The In selection option is checked ( Logical, as the amount of chars is greater than 1,024 )
          …
          Surprisingly, the regex, shown in the search field, is STILL the first regex and not the second expected multi-lines regex !
          …
          Whatever you decided, if you re-open the Find dialog, the first regex, ending with the string # Followed with SPACE, is STILL present in the search field, although we previously select the second regex ???!!! Why ?

          The way it works (I think) is that if In selection is going to become checkmarked due to the N++ code doing it, i.e., if the number of bytes in the selected text is greater-than-or-equal-to¹ the setting “Minimum Size for Auto-Checking ‘In selection’”, then the selected text is NOT supposed to be copied to Find what, regardless of the setting for Fill Find Field with Selected Text.

          Why not?

          Well, if the code has determined that you are going to be searching within the selected text, copying the selected text to Find what doesn’t make sense. You already know how many matches that would generate (exactly one). So, it is waiting for you to put something different in Find what.

          ¹ : In attempting to verify this before posting, I found out that it has to be greater-than, not greater-than-or-equal-to. Thus, for the default case of 1024, if 1024 bytes are selected when Ctrl+f is invoked, the checkbox will become checkmarked AND the selected text will be copied to Find what. This seems wrong to me; it should be as I first stated, greater-than-or-equal-to.


          Third manipulation :
          …
          Select, again, the contents of the second regex ( 1,802 chars )
          Before opening the Find dialog, just hit the Ctrl + F3 shortcut ( This is the WORK-AROUND ! )

          Ctrl+F3 is Select and Find Next which is wholly different from a “select” followed by a Find Next. It’s not affected by, nor constrained by, the In selection setting.


          It’s important to add that, modifying in Preferences… > Searching > When Find Dialog is Invoked the value 1,024 to the value 0, does NOT change the global behavior

          I’m unclear on what you mean by this.

          1 Reply Last reply Reply Quote 2
          • guy038G
            guy038
            last edited by guy038

            Hi, @mkupper, @alan-kilborn and All,

            Sorry, I was out these last three hours !

            When I said :

            It’s important to add that, modifying in Preferences... > Searching > When Fin Dialog is Invoked the value 1,024 to the value 0, does NOT change the global behavior

            I meant that my three manipulations produce the same results if you previously chose the 1,024 value or the 0 value : NO change in behavior.

            BR

            guy038

            Thinking about it, I don’t know it this is judicious. If I choose the zero value, as auto-checking is disabled, if I select an important amount of text and immediately invoke the Find dialog, it should fill up the Find what field up to 2,046 characters !

            Alan KilbornA mkupperM 2 Replies Last reply Reply Quote 0
            • Alan KilbornA
              Alan Kilborn @guy038
              last edited by

              @guy038 said:

              If I choose the zero value, as auto-checking is disabled, if I select an important amount of text and immediately invoke the Find dialog, it should fill up the Find what field up to 2,046 characters !

              I’d say that that sounds reasonable.

              1 Reply Last reply Reply Quote 1
              • mkupperM
                mkupper @guy038
                last edited by

                @guy038, @Alan-Kilborn, and others

                Congratulations on discovering that Ctrl+F3 trick to bypass the 1024 character selection to the Find-what field limit.

                That prompted me to do a test.

                • I created a 5000 character long ruler line with ________10________20________30 … ______4980______4990______5000.
                • I duplicated that line a few times.
                • I put the caret on one of the lines and did Ctrl+F3. Notepad++ immediately jumped to the next line and there’s a bunch of text selected.
                • I did Ctrl+C to load the selected text into the copy/paste buffer, and pasted to a blank line in the area below my list of 5000 character rulers.
                • I saw that the new line is 2047 characters long and runs from ________10________20________30 … ______2030______2040______2

                My first thought was, “wait, I thought the search limit was 2046 characters. Apparently the quick search thing using Ctrl+F3 will search for up to 2047 character patterns.”

                I verified that a reverse quick search using Ctrl+Shift+F3 also allows or up to 2047 character patterns.

                • I moved the caret to a blank line (it can be any blank area) and did Ctrl+F to activate the normal find dialog.
                • I see that Find what is populated and that it has 2027 characters in the Find what field.
                • I click [Find Next] which selects some stuff, Esc to close the dialog, Ctrl+C to load the text I found into the copy/paste buffer, and paste that to a new line.

                The search using [Find Next] matched the first 2046 characters that were in the Find-what field.

                Anyway, that’s excellent that we can have regular expressions that are up to 2046 characters long and to get them into the Find-what field without using copy/paste. The procedure will be to:

                • Select the long regexp
                • Crtl+F3 or Shift+Crtl+F3
                • Move the caret to a blank area so that the caret is not within nor touching a word.
                • Crtl+F and the Find-what field will be populated.

                This will be handy as something I frequently do is to load the replacement string into the copy/paste buffer, select the search pattern, Ctrl+H, tab down to the Replace with field, and Ctrl+V to fill in the replacement pattern.

                That works for search patterns for up to 1024 characters and now with @guy038’s Ctrl+F3 trick I can get around the 1024 character limit and don’t need to use the copy/paste buffer to do that.

                A few weeks ago I took a look at the Notepad++ source code to better understand the 1024 and 2046 character limits.

                • The 1024 character limit comes from the default value for the Minimum size for auto-checking in-selection setting. It’s a Notepad++ bug as that setting is unrelated to the the limit for auto-loading the selection into the Find what field. Unfortunately, the fix is not easy as the internal constant that has the 1024 is used in several different way.

                • The 2046 character limit seems to be either a different Notepad++ bug or a Scintilla limit or bug. The buffers that hold search and replace patters are 2028 16-bit characters long. The pattern is NUL terminated meaning we should be able to have up to 2047 character long patterns. The code has an extra subtraction somewhere that causes 2047 to be 2046. The extra subtract seems to be buried in the Notepad++ logic that’s dealing with Scintilla.

                While looking at the 2046 character issue I saw that the pattern buffer uses 16-bit wide characters. I verified that you can search for up to 2046 16-bit characters such as ⛱⛱⛱...⛱⛱⛱ (U+26F1). If you search for an extended Unicode character such as 🦎🦎🦎...🦎🦎🦎 (U+1F98E or the surrogate pair \x{D83E}\x{DD8E} ) then you will be limited to 1023 characters.

                I did discover some weirdness with Ctrl+F3.

                • Using Ctrl+F3 when the caret is within a short string such as ⛱⛱⛱⛱⛱⛱⛱⛱ seems to do nothing. The text is not loaded into the Find what field. I suspect that ⛱ is not a word character.
                • ❽ (U+277D) is a word character but Ctrl+F3 gets confused by ❽❽❽...❽❽❽ as it seems to be selecting and searching for the entire line. I was able to do 5000 character search matches using long strings of ❽❽❽...❽❽❽. As I knew the buffers are 2024 words long I suspected there was a buffer overflow.
                Alan KilbornA 1 Reply Last reply Reply Quote 2
                • Alan KilbornA
                  Alan Kilborn @mkupper
                  last edited by

                  @mkupper said:

                  The 1024 character limit comes from the default value for the Minimum size for auto-checking in-selection setting. It’s a Notepad++ bug as that setting is unrelated to the the limit for auto-loading the selection into the Find what field

                  It isn’t really a bug, it’s more of a historical vestige. Before the setting existed, 1024 was used for even more purposes. Think of the setting as always-existing, but at an unchangeable value: 1024.

                  mkupperM 1 Reply Last reply Reply Quote 0
                  • mkupperM
                    mkupper @Alan Kilborn
                    last edited by

                    @Alan-Kilborn said in Bug when a multi-lines regex is used in the 'Search', 'Replace' or 'Mark' dialog:

                    It isn’t really a bug, it’s more of a historical vestige. Before the setting existed, 1024 was used for even more purposes. Think of the setting as always-existing, but at an unchangeable value: 1024.

                    Agreed but it’s not quite that bad at present. There is a constant within the npp source code that defines both the maximum value and default value for the Settings / Preferences / Searching / Minimum Size for Auto-Checking "In selection". It defaults to 1024. The length of the current selection is visible in the Sel: part of Notepad++'s status line. If the number is 0 to 1,023 and you do Ctrl+F, Ctrl+H, or Ctrl+M to bring up the Find, Replace, or Mark dialog boxes then the In Selection field will not be enabled. If the number is 1024 or larger and you do Ctrl+F, Ctrl+H, or Ctrl+M then In Selection field will be enabled. You can change this threshold via Settings / Preferences.

                    That works well.

                    The same internal constant that defines the default and/or maximum value for the Minimum Size for Auto-Checking "In selection" thing I just mentioned is also used by the code for the Find, Replace, and Mark dialog boxes to decide if the current selection should auto-populate the Find what field. If the current selection is from 1 to 1024 characters then Find what gets populated with whatever is in the selection. If the current selection is zero or is more than 1024 characters then the selection is ignored and Find what contains whatever was in there before.

                    The preferences setting for Minimum Size for Auto-Checking "In selection" does not control the current selection auto-populates Find what thing. Auto-population is a constant and is 1024.

                    A few weeks ago I wrote up some notes to myself about the name of this internal constant and started teasing out how and where the constant gets used. I can’t find my notes at the moment. My plan at the time was to submit a feature request on github that adds a new constant and showed exactly how and where in the code it should be used so that we can separate out the current selection to In selection vs current selection to Find what. I’d still like to do that but at the time I realized the npp code is a marvelously tangled ball of yarn and so needed to move carefully with my nip-n-tuck.

                    I also realized I probably should work on being able to compile my own copies if Notepad++.exe as there were areas where the current values of some internal variables are not obvious. I wanted to change the current selection to Find what limit from 1024 to 2047 characters and to do that should also fix whatever causes Notepad++'s 2046 character limit. Nearly all of Notepad++'s code thinks the limit for search patterns is 2047 characters but something in there restricts searches to 2046 characters.

                    1 Reply Last reply Reply Quote 1
                    • guy038G
                      guy038
                      last edited by guy038

                      Hello, @mkupper, @alan-kilborn and All,

                      Here is an other example where the search limits ( 1,024 and 2,046 ) prevent us for correct searching of a multi-lines regex !

                      Follow the link below and you’ll get the general multi-lines regex of an URI ( Uniform Ressource Identifier )

                      https://jmrware.com/articles/2009/uri_regexp/URI_regex.html#uri-43

                      We get this section :

                      # RFC-3986 URI component: URI-reference
                      (?:                                                               # (
                        [A-Za-z][A-Za-z0-9+\-.]* :                                      # URI
                        (?: //
                          (?: (?:[A-Za-z0-9\-._~!$&'()*+,;=:]|%[0-9A-Fa-f]{2})* @)?
                          (?:
                            \[
                            (?:
                              (?:
                                (?:                                                    (?:[0-9A-Fa-f]{1,4}:){6}
                                |                                                   :: (?:[0-9A-Fa-f]{1,4}:){5}
                                | (?:                            [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){4}
                                | (?: (?:[0-9A-Fa-f]{1,4}:){0,1} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){3}
                                | (?: (?:[0-9A-Fa-f]{1,4}:){0,2} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){2}
                                | (?: (?:[0-9A-Fa-f]{1,4}:){0,3} [0-9A-Fa-f]{1,4})? ::    [0-9A-Fa-f]{1,4}:
                                | (?: (?:[0-9A-Fa-f]{1,4}:){0,4} [0-9A-Fa-f]{1,4})? ::
                                ) (?:
                                    [0-9A-Fa-f]{1,4} : [0-9A-Fa-f]{1,4}
                                  | (?: (?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9]) \.){3}
                                        (?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])
                                  )
                              |   (?: (?:[0-9A-Fa-f]{1,4}:){0,5} [0-9A-Fa-f]{1,4})? ::    [0-9A-Fa-f]{1,4}
                              |   (?: (?:[0-9A-Fa-f]{1,4}:){0,6} [0-9A-Fa-f]{1,4})? ::
                              )
                            | [Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&'()*+,;=:]+
                            )
                            \]
                          | (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
                               (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
                          | (?:[A-Za-z0-9\-._~!$&'()*+,;=]|%[0-9A-Fa-f]{2})*
                          )
                          (?: : [0-9]* )?
                          (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*
                        | /
                          (?:    (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+
                            (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*
                          )?
                        |        (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+
                            (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*
                        |
                        )
                        (?:\? (?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})* )?
                        (?:\# (?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})* )?
                      | (?: //                                                          # / relative-ref
                          (?: (?:[A-Za-z0-9\-._~!$&'()*+,;=:]|%[0-9A-Fa-f]{2})* @)?
                          (?:
                            \[
                            (?:
                              (?:
                                (?:                                                    (?:[0-9A-Fa-f]{1,4}:){6}
                                |                                                   :: (?:[0-9A-Fa-f]{1,4}:){5}
                                | (?:                            [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){4}
                                | (?: (?:[0-9A-Fa-f]{1,4}:){0,1} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){3}
                                | (?: (?:[0-9A-Fa-f]{1,4}:){0,2} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){2}
                                | (?: (?:[0-9A-Fa-f]{1,4}:){0,3} [0-9A-Fa-f]{1,4})? ::    [0-9A-Fa-f]{1,4}:
                                | (?: (?:[0-9A-Fa-f]{1,4}:){0,4} [0-9A-Fa-f]{1,4})? ::
                                ) (?:
                                    [0-9A-Fa-f]{1,4} : [0-9A-Fa-f]{1,4}
                                  | (?: (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) \.){3}
                                        (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
                                  )
                              |   (?: (?:[0-9A-Fa-f]{1,4}:){0,5} [0-9A-Fa-f]{1,4})? ::    [0-9A-Fa-f]{1,4}
                              |   (?: (?:[0-9A-Fa-f]{1,4}:){0,6} [0-9A-Fa-f]{1,4})? ::
                              )
                            | [Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&'()*+,;=:]+
                            )
                            \]
                          | (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
                               (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
                          | (?:[A-Za-z0-9\-._~!$&'()*+,;=]|%[0-9A-Fa-f]{2})*
                          )
                          (?: : [0-9]* )?
                          (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*
                        | /
                          (?:    (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+
                            (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*
                          )?
                        |        (?:[A-Za-z0-9\-._~!$&'()*+,;=@] |%[0-9A-Fa-f]{2})+
                            (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*
                        |
                        )
                        (?:\? (?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})* )?
                        (?:\# (?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})* )?
                      )                                                                       # )
                      

                      which is a very long multi-lines regex, of size 3,991 characters. Thus, not searchable with Notepad++ !


                      You could say, if this multi-lines regex is changed into a single-line regex, may be it’ll be OK ?

                      First, let’s find out a way to transform any MULTI-line regex in a SINGLE-line equivalent regex :

                      • Open the Replace dialog ( Ctrl + H )

                      • Uncheck all the box options

                      • Check the In selection option ( IMPORTANT )

                      • FIND (?x-s) (?: \[ [^\x5B\x5D\r\n]+ \] | \\ [ #] | \\x20 | \\x23 ) (*SKIP) (*F) | \x20* [#] .* (?: \R | \z ) | \x20+ | \R

                      • REPLACE Leave EMPTY

                      • Select the Regular expression search mode

                      • Now, do a stream selection of all the lines of the MULTI-lines regex which must be shortened

                      • To end, click once only, on the Replace All button

                      => You get the expected SINGLE-line regex, still selected

                      If we apply the above S/R to our regex example, it returns the following single-line regex :

                          (?:[A-Za-z][A-Za-z0-9+\-.]*:(?://(?:(?:[A-Za-z0-9\-._~!$&'()*+,;=:]|%[0-9A-Fa-f]{2})*@)?(?:\[(?:(?:(?:(?:[0-9A-Fa-f]{1,4}:){6}|::(?:[0-9A-Fa-f]{1,4}:){5}|(?:[0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){4}|(?:(?:[0-9A-Fa-f]{1,4}:){0,1}[0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){3}|(?:(?:[0-9A-Fa-f]{1,4}:){0,2}[0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){2}|(?:(?:[0-9A-Fa-f]{1,4}:){0,3}[0-9A-Fa-f]{1,4})?::[0-9A-Fa-f]{1,4}:|(?:(?:[0-9A-Fa-f]{1,4}:){0,4}[0-9A-Fa-f]{1,4})?::)(?:[0-9A-Fa-f]{1,4}:[0-9A-Fa-f]{1,4}|(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9]))|(?:(?:[0-9A-Fa-f]{1,4}:){0,5}[0-9A-Fa-f]{1,4})?::[0-9A-Fa-f]{1,4}|(?:(?:[0-9A-Fa-f]{1,4}:){0,6}[0-9A-Fa-f]{1,4})?::)|[Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&'()*+,;=:]+)\]|(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)|(?:[A-Za-z0-9\-._~!$&'()*+,;=]|%[0-9A-Fa-f]{2})*)(?::[0-9]*)?(?:/(?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})*)*|/(?:(?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+(?:/(?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})*)*)?|(?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+(?:/(?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})*)*|)(?:\?(?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})*)?(?:\#(?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})*)?|(?://(?:(?:[A-Za-z0-9\-._~!$&'()*+,;=:]|%[0-9A-Fa-f]{2})*@)?(?:\[(?:(?:(?:(?:[0-9A-Fa-f]{1,4}:){6}|::(?:[0-9A-Fa-f]{1,4}:){5}|(?:[0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){4}|(?:(?:[0-9A-Fa-f]{1,4}:){0,1}[0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){3}|(?:(?:[0-9A-Fa-f]{1,4}:){0,2}[0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){2}|(?:(?:[0-9A-Fa-f]{1,4}:){0,3}[0-9A-Fa-f]{1,4})?::[0-9A-Fa-f]{1,4}:|(?:(?:[0-9A-Fa-f]{1,4}:){0,4}[0-9A-Fa-f]{1,4})?::)(?:[0-9A-Fa-f]{1,4}:[0-9A-Fa-f]{1,4}|(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))|(?:(?:[0-9A-Fa-f]{1,4}:){0,5}[0-9A-Fa-f]{1,4})?::[0-9A-Fa-f]{1,4}|(?:(?:[0-9A-Fa-f]{1,4}:){0,6}[0-9A-Fa-f]{1,4})?::)|[Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&'()*+,;=:]+)\]|(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)|(?:[A-Za-z0-9\-._~!$&'()*+,;=]|%[0-9A-Fa-f]{2})*)(?::[0-9]*)?(?:/(?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})*)*|/(?:(?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+(?:/(?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})*)*)?|(?:[A-Za-z0-9\-._~!$&'()*+,;=@]|%[0-9A-Fa-f]{2})+(?:/(?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})*)*|)(?:\?(?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})*)?(?:\#(?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})*)?)
                      

                      Which is still a very long regex of size 2609 characters ! So, no chance : even with the contracted form of the regex, it’s still over 2,046 characters and not searchable with Notepad++ :-((


                      By the way, the site’s ability to highlight any sub-section of the regex in green is really awesome !

                      Best Regards,

                      guy038

                      Alan KilbornA 1 Reply Last reply Reply Quote 0
                      • Alan KilbornA
                        Alan Kilborn @guy038
                        last edited by

                        @guy038 said:

                        the site’s ability to highlight any sub-section of the regex in green is really awesome

                        https://jmrware.com/articles/2010/dynregexhl/DynamicRegexHighlighter.html

                        1 Reply Last reply Reply Quote 1
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors