Bug when a multi-lines regex is used in the 'Search', 'Replace' or 'Mark' dialog
-
Hello, All,
I suppose that this bug was introduced when the
Minimum size for checking "In Selection"
value was added inPreferences > Searching > When Find Dialog is Invoked
, in thev8.5.8
release ( point#12
)- Do a stream selection of the multi-lines regex below :
(?x-i) # Search SENSIBLE to the CASE (?<=\x20) # Preceded with a SPACE char (?: # BEGINNING of the NON-CAPTURING group LETTER | CAPITAL | SMALL | DIGIT | FRACTION | LIGATURE | SUPERSCRIPT | SUBSCRIPT | CIRCLED | PARENTHESIZED | MATHEMATICAL | FULL[ ]STOP | ROMAN | EPACT ) # END of the NON-CAPTURING group (?=\x20) # Followed with a SPACE char
-
Open the
Mark
dialog (Ctrl + M
) -
Select the two options
Purge for each search
andWrap around
, only -
Choose the
Regular expression
mode -
Click on the
Mark All
button
Against the following text, you get the message
47 matches in entire file
| 0030 | DIGIT ZERO | 0 | 0370 | GREEK CAPITAL LETTER HETA | Ͱ | 0400 | CYRILLIC CAPITAL LETTER IE WITH GRAVE | Ѐ | 0500 | CYRILLIC CAPITAL LETTER KOMI DE | Ԁ | 10A0 | GEORGIAN CAPITAL LETTER AN | Ⴀ | 1C80 | CYRILLIC SMALL LETTER ROUNDED VE | ᲀ | 1D00 | LATIN LETTER SMALL CAPITAL A | ᴀ | 1E00 | LATIN CAPITAL LETTER A WITH RING BELOW | Ḁ | 2070 | SUPERSCRIPT ZERO | ⁰ | 2C60 | LATIN CAPITAL LETTER L WITH DOUBLE BAR | Ⱡ | 2D00 | GEORGIAN SMALL LETTER AN | ⴀ | A640 | CYRILLIC CAPITAL LETTER ZEMLYA | Ꙁ | A722 | LATIN CAPITAL LETTER EGYPTOLOGICAL ALEF | Ꜣ | AB30 | LATIN SMALL LETTER BARRED ALPHA | ꬰ | FB00 | LATIN SMALL LIGATURE FF | ff | FF10 | FULLWIDTH DIGIT ZERO | 0 | 102E1 | COPTIC EPACT DIGIT ONE | 𐋡 | 10500 | ELBASAN LETTER A | 𐔀 | 10780 | MODIFIER LETTER SMALL CAPITAL AA | 𐞀 | 1CCD6 | OUTLINED LATIN CAPITAL LETTER A | | 1D400 | MATHEMATICAL BOLD CAPITAL A | 𝐀 | 1DF00 | LATIN SMALL LETTER FENG DIGRAPH WITH TRILL | 𝼀 | 1E030 | MODIFIER LETTER CYRILLIC SMALL A | 𞀰 | 1F100 | DIGIT ZERO FULL STOP | 🄀 | 1FBF0 | SEGMENTED DIGIT ZERO | 🯰
- Again, do a stream selection of the multi-lines regex, below
(?x-i) # Search SENSIBLE to CASE (?<=\x20) # Preceded with SPACE (?: # Start NON-CAPTURING group 0[0-2][0-9A-F][0-9A-F] | 03[7-9A-F][0-9A-F] | 04[0-9A-F][0-9A-F] | 05[0-8][0-9A-F] | 10[A-F][0-9A-F] | 1C[8-B][0-9A-F] | 1D[0-9AB][0-9A-F] | 1[EF][0-9A-F][0-9A-F] | 20[7-C][0-9A-F] | 21[0-8][0-9A-F] | 24[6-9A-F][0-9A-F] | 25[A-F][0-9A-F] | 27[0-B][0-9A-F] | 2C[6-9A-F][0-9A-F] | 2D[012][0-9A-F] | A6[4-9][0-9A-F] | A7[2-9A-F][0-9A-F] | AB[3-6][0-9A-F] | FB[01][0-9A-F] | FF[0-5E][0-9A-F] | 102[EF][0-9A-F] | 105[0-2][0-9A-F] | 107[89AB][0-9A-F] | 1CC[DEF][0-9A-F] | 1D[4-7][0-9A-F][0-9A-F] | 1DF[0-9A-F][0-9A-F] | 1E0[3-8][0-9A-F] | 1F1[0-9A-F][0-9A-F] | 1FB[0-9A-F][0-9A-F] ) # End NON-CAPTURING group (?=\x20) # Followed with SPACE # END MULTI-lines Regex
Note, in the status bar, that
1,047
characters have been selected- Open the
Mark
dialog (Ctrl + M
)
Note that the last part of the combo box does NOT show the text
# END MULTI-lines Regex
but shows the previous text# Followed with a SPACE char
-
Select the two options
Purge for each search
andWrap around
, only -
Choose the
Regular expression
mode -
Uncheck the
In selection
option -
Click on the
Mark All
button
The previous search is RE-run and the same matches occurred !
Now, RE-select the second multi-lines regex, without including the last line
# END MULTI-lines Regex
Note that, this time, the indication
1,022
characters, is shown in the status bar- Open the
Mark
dialog (Ctrl + M
)
Note that, this time, the last part of the combo box show the text
# Followed with SPACE
and NOT the previous text# Followed with a SPACE char
. So, this second regex seems correctly taken in account !-
Select the two options
Purge for each search
andWrap around
, only -
Choose the
Regular expression
mode -
Click on the
Mark All
button
=> This time, the marked text is, as expected, all the hexadecimal values beginning each line and the message said
25 matches in entire file
! The process is OK because it’s under the limit of1,024
bytes.To my mind, It would be best to increase the
1,024
value, to automatically check theIn selection
option, to2,048
, which, roughly, corresponds to the maximum of characters that the Find dialog may contain !Best Regards,
guy038
P.S. : I came across this bug when preparing my post about the new
Locale Order
feature ! -
@guy038, I decided to try a different test.
I had three lines of text
aaa... repeated 1024 times. bbb... repeated 1025 times. ccc... repeated 32768 times.
- No text is selected.
- Position the caret on the first line and do
Ctrl+F
. The Find dialog will pop up with the Find field populated withaaa...aaa
. - Close the dialog box and try again on line 2. The Find dialog pops up again but the Find field stil has
aaa...aaa
and not the expectedbbb...bbb
The behavior changes from 1024 to 1025 characters in the selection. It’s a sort of well known issue and for a multi-line
(?x)
free-form search is painful as you can’t get around the 1024 character limit by copy/pasting the text into the search or find field.While the magic number is 1024 this is apparently unrelated to the magic number 1024 found in
Settings / Preferences / Searching (tab)
setting forMinimum Size for Auto-Checking "In-selection"
.As you can get around the 1024 limit by copy/pasting into the Find field I used the
ccc...ccc
line to test this. I loaded line 3 into the copy/paste buffer and then did aCtrl+F
to bring up the search dialog and then pasted into the Find field.I discovered that the Find field is limited to 2046 characters.
- Do a search for
ccc...ccc
and you will discover that it matches and selects 2046 characters from line 3. It does not matter if you use Normal, Extended, or Regular Expression mode. - If you re-activate the Find dialog you will discover that the Find field has 2046 characters in it.
There is another upper limit which is that the Find field allows for up to 30,000 characters. You can’t paste more than 30,000 characters into the field. FWIW, Microsoft Notepad’s Find field is limited to 127 characters.
If you put in a feature request then I’m inclined to vote for than an upper limit of 30,000 characters in the selection will auto-populate the Find field. Hopefully, both the normal and extended search allow for 30,000 character searches.
We know that regular expression mode has a much lower limit but I don’t see a clean way to impose a smaller limit for that mode while allowing for switching modes within the dialog box.
-
Hello, @mkupper an All,
Really sorry, but I’m rather confused !
@mkupper, you said :
I discovered that the Find field is limited to
2046
characters.And two lines below, you said :
There is another upper limit which is that the Find field allows for up to
30,000
charactersTo my mind, the former number is correct, No ??!!
I also verified, on my old
Win-XP
laptop, with the last XP version of N++ (v7.9.2
) :-
That it is possible to get a multi-lines regex up to the
2,046
characters -
That the automatic check of the
In selection
option, although NOT configurable in thePreferences...
dialog, at that time, is effective for the1,025
or upper values
So, at the time of the
v7.9.2
release, the regex limit of chars and the automaticIn selection
limit seemed unrelated ! Not sure that it’s still the case, nowadays ?Best Regards,
guy038
-
-
Note, in the status bar, that 1,047 characters have been selected
Trying to follow along, I don’t see how the above happens.
-
Hi, @alan-kilborn, and All,
Oh, yes, sorry @alan-kilborn, It’s a typo : I use , generally, the
~~~
string to define and end a text block. But, this time, I forgot one tilde for a block end :-((I edited my first post and correct it !
So, just retry and copy the third text section, of my initial post, in the clipboard with the upper-right corner button. It should be OK !
BR
guy038
-
@mkupper said:
There is another upper limit which is that the Find field allows for up to 30,000 characters. You can’t paste more than 30,000 characters into the field.
If you’re speaking slightly sloppily, then this makes sense. I’d guess that the limit is actually 32767, the default Windows value for an edit control, see HERE.
But isn’t it true that in Notepad++, even though you can put more than 2046 characters in the Find what box (e.g. via pasting), it ignores anything over 2046 when executing the search?
Side Note: Also, what Notepad++ uses as a limit might be 2046 bytes, not characters in a strict sense. I haven’t looked at this lately, but if memory serves if you use multibyte characters in the Find what box data, the limit is going to be less than 2046.
-
@Alan-Kilborn said in Bug when a multi-lines regex is used in the 'Search', 'Replace' or 'Mark' dialog:
If you’re speaking slightly sloppily, then this makes sense. I’d guess that the limit is actually 32767, the default Windows value for an edit control, see HERE .
I hope it was not that sloppy. Here are the repro steps for what I did yesterday though suspect including the repro details makes this a TL;DR style post.
-
I am running v8.8.1 though I don’t think the version matters much as all of this repro also worked in v8.7.9.
-
I used Excel and Notepad++ to construct some “rulers” that start with the length and have markers. The rulers are 1024, 1025, and 70000 characters long.
I also included a line with the word
random
which is a word that I use to pre-load theFind what
field at times.random 1024____10________20________30________40________50________60________70________80________90_______100_______110_______120_______130_______140_______150_______160_______170_______180_______190_______200_______210_______220_______230_______240_______250_______260_______270_______280_______290_______300_______310_______320_______330_______340_______350_______360_______370_______380_______390_______400_______410_______420_______430_______440_______450_______460_______470_______480_______490_______500_______510_______520_______530_______540_______550_______560_______570_______580_______590_______600_______610_______620_______630_______640_______650_______660_______670_______680_______690_______700_______710_______720_______730_______740_______750_______760_______770_______780_______790_______800_______810_______820_______830_______840_______850_______860_______870_______880_______890_______900_______910_______920_______930_______940_______950_______960_______970_______980_______990______1000______1010______10201024 1025____10________20________30________40________50________60________70________80________90_______100_______110_______120_______130_______140_______150_______160_______170_______180_______190_______200_______210_______220_______230_______240_______250_______260_______270_______280_______290_______300_______310_______320_______330_______340_______350_______360_______370_______380_______390_______400_______410_______420_______430_______440_______450_______460_______470_______480_______490_______500_______510_______520_______530_______540_______550_______560_______570_______580_______590_______600_______610_______620_______630_______640_______650_______660_______670_______680_______690_______700_______710_______720_______730_______740_______750_______760_______770_______780_______790_______800_______810_______820_______830_______840_______850_______860_______870_______880_______890_______900_______910_______920_______930_______940_______950_______960_______970_______980_______990______1000______1010______1020_1025 (70000 character test string removed as the forum does not allow for more than 16384 character long posts)
At times I’ll say to preload the random word into the
Find what
field. By this I mean:- Select or put the caret on the word
random
(or anything of your choosing). Ctrl-F
to bring up the Find dialog box.- See that the desired word is in the
Find what
field - Press
Esc
to close the Find dialog box. - Move the caret to a blank area of the document. (that’s why I have a blank like above and below the word
random
in the test data.) - You may do another
Ctrl-F
to bring up the Find dialog box andFind what
field should have the random word in it.
** The 30000 character
Find what
field limit **- Preload the random word into the
Find what
field. - Load the 70000 character ruler (without the end of line) into the copy/paste buffer.
- Move the caret to a blank area so that it’s not sitting on the ruler or some other word.
Ctrl+F
to bring up the Find dialog box and thenCtrl+V
to paste the ruler into theFind what
field.- You should see that the
Find what
field starts with70000___10...
and ends with..._____29980_____29990_____30000
Ctrl+A
to select all of theFind what
field contents,Ctrl+C
to load that into the copy/paste buffer,Esc
to close the Find dialog box, andCtrl+V
to paste the results into the Notepad++ document.- You should see a 30000 character long line that starts with
70000___10
and ends with_____30000
.
** The 1024 character automatic select and load into
Find what
limit **This is the item that started this forum thread.
- Preload the random word into the
Find what
field. - Put the caret on the 1025 character ruler and then do
Ctrl-F
to see what’s inFind what
. You will see it’s still the pre-loaded random word. - Try various things such as selecting the ruler and then doing
Ctrl+F
. You will still get the pre-loaded random word. - Repeat steps 1, 2 and 3 using the 1024 character long ruler. You will discover that this ruler loads into the
Find what
field.
** The 2046 character limit for Notepad++ searches **
- First do steps 1 to 5 of the The 30000 character
Find what
field limit repo that’s above. - Do
Enter
(or click Find Next) to search for whatever is in theFind what
field and then pressEsc
. - You will see that the first 2046 characters of the 70000 character ruler are selected. I did a
Ctrl+C
and pasted that to it’s own line to verify that it’s a 2046 character long line. - Do
F3
and Notepad++'s search thing will continue to find/select the first 2046 characters of the 70000 character ruler. (You can make extra copies of this ruler if desired)
** Bonus on the 2046 character limit for Notepad++ searches **
I wondered if I could trick Notepad++ into using more than 2046 characters and so tried this:
- Exit Notepad++, edit the config.xml file, and added
2050______2060
to the<Find name="70000___10________20
line so that it ends with2040______2050______2060" />
- I started Notepad++, went to a blank area, and did
Ctrl+F
. I discovered thatFind what
is pre-loaded with a 2060 long value that starts with70000___10________20
and ends with2040______2050______2060
. - Searches though are still limited to 2046 characters.
** Bonus on the
F3
search **I discovered that Notepad++ must have a separate internal buffer that it uses for the
F3
search. If you start Notepad++ and then doF3
then nothing happens even though the top of the find history is something that should be in the document.Related to this is if you preload something into the find history that it’s not available for an
F3
search. For example, put the caret on the wordNotepad
, doCtrl-F
and thenEsc
. TappingF3
will not search forNotepad
but instead it it searches for whatever you had last searched for.I ran into this as I was hoping to preload the 2060 character long string by editing config.xml, starting Notepad++, and then doing an
F3
to see how long the resulting selection was. Nothing happened as there was noting in the 'text to search for` buffer. Thus I could not use this method to fool Notepad++ into searching for more than 2046 characters.Also, when either pre-load or copy/paste something into the
Find what
field and then exit Notepad++ without ever searching for that value then it’s not saved to the config.xml file.- Thus, while you can copy/paste a 30000 character long string so that it shows up at the top of the search history this will not get saved to config.xml file.
- If you do a search for that 30000 character long string then it seems that it’s first truncated to 2046 characters and it then does the seach. I believe it truncates first as the
Find what
field is truncated on the spot when you click the[Find Next]
- Thus the dialog box does not let you search for more than 2046 characters.
- The truncated value will also now be at the top of the search history, and when you exit Notepad++ the 2046 character long string gets written to the config.xml file.
I did not do any testing with Notepad++ macros or PythonScript to see if I could use more than 2046 characters in a search pattern.
But isn’t it true that in Notepad++, even though you can put more than 2046 characters in the Find what box (e.g. via pasting), it ignores anything over 2046 when executing the search?
That seems to be true and it also truncates the field to 2046 characters when adding it to the search history.
Side Note: Also, what Notepad++ uses as a limit might be 2046 bytes, not characters in a strict sense. I haven’t looked at this lately, but if memory serves if you use multibyte characters in the Find what box data, the limit is going to be less than 2046.
That’s correct which is why I used plain ASCII for these tests. I’ve forgotten if the limits in this area are related to UTF-8 encoding and/or some characters need more bits than the seven needed for plain ASCII.
-
-
Follow up on the previous post as the forum software did not allow for a 70000 character long ruler style test string. I had used Excel.
10 =REPT("_",10-LEN(A1))&TEXT(A1,"0") =A1+10 =REPT("_",10-LEN(A2))&TEXT(A2,"0") =A2+10 =REPT("_",10-LEN(A3))&TEXT(A3,"0") =A3+10 =REPT("_",10-LEN(A4))&TEXT(A4,"0") ...
repeat that for 7000 rows. Row 7000 has:
=A6999+10 =REPT("_",10-LEN(A7000))&TEXT(A7000,"0")
with the result being:
10 ________10 20 ________20 30 ________30 ... 70000 _____70000
I then copy/pasted column B into Notepad++,
verified 7000 lines, and then search/replace
to remove the\R
to generate:________10________20________30 ... _____70000
-
Hello, @mkupper, @alan-kilborn and All,
@mkupper, I repeated all your process and indeed, your method and explanations were very instructive !
I just do NOT understand one point, yet. You said, in a previous post :
Hopefully, both the normal and extended search allow for
30,000
character searches.Well, repeating the points 1 to 5 of The 30000 characters
Find what
field limit section, with the FIND dialog pre-configured in Normal or Extended mode, it just matches the first2,046
characters of the 70000___10 string, although the Find field do contain the first30,000
chars of the 70000___10 string ?!Best regards,
guy038