Replace-All can only replace up to 2046 characters long

John Liudr

Use case:
I have a lot of lines of compilation commands I wanted to analyze. The commands were generated by Arduino IDE and are very long, over 26KB per line (yep). I wanted to remove the parts of the commands that have all the included library and only leave the options and the file name being compiled. I realized that even if I select all of the stuff to replace with an empty space, I only get first 2046 characters replaced.

Cause: “Find what” text entry only allows 2046(or 2047) bytes.

Recommendation: Make the text entry longer or a “Long find” check box to replace the 1-line text entry with a text box that can accommodate longer texts.

I couldn’t post even sample lines for anyone to test because this post limits text to 16383 characters. So find yourself a long text and copy paste it several times to try.

Thanks.

PeterJones

@John-Liudr ,

you really shouldn’t need to keep the entire command in the Find what, especially if you use regular expressions. The regular expression mode was designed to work for strings where a simple search/replace wasn’t sufficient, by using wildcards and pattern matching to keep them shorter and able to be more generic.

I am assuming that Arduino compilation commands are using -Ic:\long\path\to\library (or something similar) to include libraries, and that you’ve got a bunch of those… the Find What = -I[^ ]+ will match one of those when Search Mode is set to Regular Expression, and replacing with empty will delete it; doing a Replace All with that setup will remove all such instances. No reason for >2046 characters in the Find What.

----

Useful References

mkupper

@John-Liudr I suspect many users of text editors such as Notepad++ regularly deal with long strings of data. I, and I suspect many of us, deal with this by breaking the problem down into smaller chunks. For example, if I’m faced with 20K byte long command lines I’m likely to first break it down into multiple lines using
Search: (?<= )(?=-[a-z0-9])
Replace: \r\n\t

That will replace/replace a space followed by a dash followed by a letter or digit into space, CR LF TAB, and then the dash and letter/digit.

As the -parameters are now one per line I can do things such as deleting all of the -I parameters using
Search: (?-i)^\t-I.+\R
Replace: (nothing)

Multiple shorter lines is mentally easier for me to both see and to do search/replaces on.

When I’m done I reassemble the thing into one long line again using
Search: (?<= )\r\n\t(?=-[a-z0-9])
Replace: (nothing)

If I find myself needing to do a long search and/or replace expression then I tend to make them shorter by first inserting what I call anchors into the data. I pick a character such as ~ for my anchor and first make sure that character does not exist in the file. I can then insert anchors into the data and then use those anchors when doing search/replaces. The CR/LF/TABs I added and then removed in the above example are also anchors that have the side benefit in that the editor and its search/replace system had many line oriented features that I can take advantage of.

guy038

Hello, @john-liudr, @peterjones, @mkupper and All,

@john-liudr, regarding the problem of the text accepted in, both, the Find what: and Replace with zones which CANNOT exceed 2,046 characters, here is a general method to search a long amount of text and to replace it by an other long amount of text, too !

In order to practically test this method, I’ll use the License.txt file of my N++ v8.7.1 version

First, I copy this file five times in an other file that I call Replace_Test.txt
Then, I select, in an other file, a large bunch of text which should be replaced, five times, in the Replace_Test.txt file, and I paste it, in the clipboard, with a Ctrl + C action

REMARK : do NOT select the last line-break ( IMPORTANT )

Now, regarding the long text to search for, which may occur one or several times in current file, and which must be replaced the same amount of times, I use the following regex S/R to define and delete the range of text :
SEARCH (?s)^\QSTART of text\E.+?(?=^\QEND of text\E)
REPLACE Leave EMPTY

Of course, in most of the cases, we may avoid the \Q and \E regex syntaxes. So, for our practical case, I’ll simply use the regex S/R :

SEARCH (?s-i)^TERMS AND CONDITIONS.+?(?=^END OF TERMS AND CONDITIONS)

REPLACE Leave EMPTY

Check the Wrap around option and the Regular expression search mode

=> As expected, after the Replace All action, it returns the message Replace All: 5 occurrences were replaced in entire file

Note that the END boundary, i.e. the string END OF TERMS AND CONDITIONS, is still present after this replacement

Now, open the Mark dialog ( Ctrl + M )
Select the three options Bookmark line, Purge for each search and Wrap around
MARK (?-i)^END OF TERMS AND CONDITIONS

=> After a click on the Mark All button, we get the message Mark: 5 matches in entire file

Then, run the Search > Bookmark > Paste to (Replace) Bookmarked Lines menu option

=> At each of the five bookmarked locations, of the Replace_Test.txt, the line END OF TERMS AND CONDITIONS has been replaced by the clipboard contents !

Finally, save the new contents of the Replace_Test.txt file

Voila !

Best Regards,

guy038

Alan Kilborn

@guy038

An interesting technique, but I think it suffers from a limitation that may make it less useful than it at first appears.

That limitation is maybe hard to put into words, but it involves that the replace point must be a line, and can’t be an arbitrary place in the data. This is because of the use of the Paste to (Replace) Bookmarked Lines command; it always replaces a line and not in the middle of a line.

Consider OP’s statement of “over 26KB per line (yep)”. It is conceivable that someone with that data would want to replace inside that range, and without introducing any line “breaks”, in which case I don’t think your idea will satisfy.

Regardless, I think @PeterJones 's response is probably the right one for the OP’s problem. Also, @mkupper also offers some solid advice. And who knows, applying a technique offered by @mkupper might get data in good form to actually use your solution.

Jim Harris

As @mkupper said, anchors and regular expressions are the magic spells that I use to untangle lines - or even files, (like blobs of downloaded bank data) - that are hopelessly mangled together.

Depending on how mangled the data is, it may take several tries - with a number of mistakes - before you get things the way you want them.

It can be a process and don’t forget to make periodic backup saves in case you have to “undo” a search-and-replace.

In my case I use Notepad++ to untangle downloaded bank data for tax purposes and, (depending on the size of the file and how mangled it is), it can take days to complete.

Be patient. Print out a regular expression cheat-sheet if you need one. (I did) And don’t give up even if it seems like you’re going backwards.