• Sort numbers in ascending order with Regex

    6
    0 Votes
    6 Posts
    1k Views
    Bebee BebrtoB

    the idea, maybe, is to make all Edit Options to make replacement in all files, not just in one file.

  • First line replacement

    6
    0 Votes
    6 Posts
    3k Views
    GS MusicG

    Thanks guys. Both solutions worked fine. Thank you for your help.

  • Ctrl + Tab doesn't switch documents

    3
    0 Votes
    3 Posts
    2k Views
    Albert BihlerA

    Thanks to your comment I found the culprit. It’s “Multi PuTTY Manager”. If I close it Ctrl+Tab in Notepad++ works as desired. By the way Multi PuTTY Manager" also blocks Ctrl+Tab in Firefox.
    Thanks for helping!

  • Space Before and After a Equals (=) character?

    2
    0 Votes
    2 Posts
    638 Views
    PeterJonesP

    The user-defined-language settings will not automatically add spaces. However, you could search for \x20*=\x20* and replace with \x20=\x20 in regular expression mode. That will change any number of spaces (including zero) followed by = followed by any number of spaces (including zero) with exactly one space, exactly one equal, and exactly one space.

  • auto completition

    2
    0 Votes
    2 Posts
    247 Views
    PeterJonesP

    Yes – well, depending on what you mean by “on demand”. If my explanation doesn’t fully solve your problem, you will have to describe it better.

    In your Notepad++ installation folder (often c:\program files\Notepad++) there will be an autoCompletion folder with a bunch of XML files, one per language¹. You can add new auto-completion terms to that file². If you have a UDL³ that you’d like to have auto-completion for, you can create a new XML file in that folder that will hold the auto-completion definitions.

    see docs: https://npp-user-manual.org/docs/auto-completion/#create-auto-completion-definition-files Notepad++ tends to save configuration files as it exits if you’ve made any changes in the GUI. Thus, when manually editing this or any config file using Notepad++, follow the sequence: close all instances of Notepad++; open one single instance; edit the file; save; exit Notepad++; re-open. At this point, the changed settings should be in effect. user defined language: https://npp-user-manual.org/docs/user-defined-language-system/
  • mark and remove

    8
    0 Votes
    8 Posts
    852 Views
    Gulab BorkarG

    @guy038

    thanks

  • Removing duplicated lines out of log file.

    9
    0 Votes
    9 Posts
    2k Views
    Sarah DuongS

    @PeterJones If it is attached with instructions send questions here. This will help you understand the question, and support it correctly. Perhaps I have not found this post yet.

  • convert .txt file to pdf - missing last char

    4
    0 Votes
    4 Posts
    426 Views
    PeterJonesP

    If sodapdf.com is doing the conversion from .txt to .pdf, why do you think it’s a Notepad++ problem?

    If you can show that sodapdf.com works with a .txt file with more than 80 characters on a line that was created in some other application, but does not work with a .txt file with more than 80 characters on a line that was created in Notepad++, then we might be able to give some ideas. But if the same thing happens no matter what creates the >80-character text line, then you might want to ask the folks at sodapdf.

  • Hide/Unhide Non-Bookmarked Lines

    7
    0 Votes
    7 Posts
    6k Views
    Mike SmithM

    @Alan-Kilborn

    It seems like you could run your bookmarking operation, then do a “Inverse Bookmark” command, and then run the script @Ekopalypse provided…to get what you want?

    Yes, that’s a good idea! I did just that, and it hid everything I didn’t need to edit. Thank you.

    24000 things to examine and edit is a huge manual task.

    I don’t thing it’s something that can be automated though. The issue is that I have all the download links in one database (which I’m editing through Notepad++), and the files are stored on a seperate server. The task I am currently processing, is to randomize all filenames:

    https://www.example.com/74547854787fileone.zip https://www.example.com/56647979548filetwo.zip https://www.example.com/01324679462filethree.zip https://www.example.com/64647452105filefour.zip

    So not only do I need to complete the task of randomizing the filenames (A task I’m achieving using Bulk Rename Utility), I have to then ensure the download links are changed to represent the relevant filename.

  • Find&Replace pairs of capitalised words

    4
    1 Votes
    4 Posts
    368 Views
    Joaquin BuenoJ

    Thanks to both, @Ekopalypse and @guy038 :) :) :)

    Both codes worked like a charm.

    For anyone interested in these codes, @Ekopalypse 's code finds every space-separated pair of words starting with a capital letter , and @guy038 's code finds every space between words starting with a capital letter. Both were good for what I needed to do!

  • Colors from stylers.xml are wrong for JSON

    2
    1 Votes
    2 Posts
    298 Views
    raul1roR

    I also tried with other themes and same problem. Wrong colors.

  • marked and copy

    2
    0 Votes
    2 Posts
    160 Views
  • Regex - Positive Look Behind With *

    7
    1 Votes
    7 Posts
    5k Views
    Ray-HR

    Hello @guy038,

    Thank you for the detailed breakdown of the problem. There is a lot of useful knowledge here. I especially appreciate you pointing out the subtleties brought upon the look-arounds by their atomic structure.

    Best,
    Ray

  • Set file type to default to ".txt" file

    7
    0 Votes
    7 Posts
    913 Views
    EkopalypseE

    @Alan-Kilborn

    At the end it depends what one is doing.
    If most of the time I create a new file by cutting/posting/pasting then
    this might be the solution - for other tasks it might be not.

  • one sentence per line

    5
    0 Votes
    5 Posts
    1k Views
    Dragoon 35D

    Thank you @guy038

  • Windows 10 Speech Recognition does not work well with Notepad++

    1
    0 Votes
    1 Posts
    286 Views
    No one has replied
  • shortening url

    8
    0 Votes
    8 Posts
    2k Views
    guy038G

    Hi, @wessel-bogers, @peterjones and All,

    @wessel-bogers, and All, when, you ask for modifications of data, with regular expresions, please, please …

    Give us a fairly large amount of your  INPUT  text ( as you did,@wessel-bogers, in your previous post )

    Give us, also, the expected  OUTPUT  text, from your specific  INPUT  test. That’s the  KEY  point !

    Then, from the differences between these two texts, even if you cannot express simply yourself, regarding your goal, we should be able to guess your needs, most of the time ;-))

    Remember : If you clearly defined the rules to process, in your  INPUT  text, half the job is already done ;-))

    See you later,

    Best Regards,

    guy038

  • UDL code folding

    5
    0 Votes
    5 Posts
    796 Views
    EkopalypseE

    This example doesn’t seem to be appropriate to show your issue with code folding.
    If I put ( and ) in Folding in code 1 style then I get the same what lisp seems to do. Left one is UDL, right one is LISP

    b8ebe726-8822-4082-9458-46da11fa680d-image.png

    The link provided doesn’t explain code workflow.
    At the moment I have to assume that a code block is, indeed,
    identified as the part between an open bracket and a closing bracket.

  • File Too Big Inconsistency

    2
    0 Votes
    2 Posts
    231 Views
    PeterJonesP

    Memory usage varies – both in Notepad++ and in the other applications and background tasks running on your machine. The closing of Notepad++ and reloading will free all the memory except that used by the active file, which gives you the biggest chance. For files in the hundreds of megabytes, it’s probably better to use the sequence of exiting all NPP instances, then looping on “start new NPP instance with no files open; open huge file; process; close file; exit NPP”

    You don’t say which version of Notepad++ you use, or whether you’re 32-bit or 64-bit (? > Debug Info would be able to easily tell us), but 64-bit should be able to handle bigger files than 32-bit (unfortunately, because of the way the Scintilla editor component is written, not files as big as 64-bit addresses would imply are possible).

    Other than that, you might try closing as many other applications as you can, and doing anything you can to limit memory usage. You might try running without plugins ("c:\program files\Notepad++\notepad++.exe" -noPlugin "superhugefile.txt") to limit Notepad++'s memory usage.

    If all your processing involves is running one or more regex inside the file, you might want to try getting a windows copy of sed or windows copy of gawk, or use a full programming language like Perl or Python – all of which may (or may not) be able to make the transformations in your text that you need without requiring the entire file to be held in memory – it depends on the processing you need to do. (Using the PythonScript plugin as a Python interpreter, you could write your script and run it inside Notepad++, even if you aren’t using the PythonScript-specific access to the currently-open-files: you’d basically be using the PythonScript.dll instance of Python instead of installing your own python.exe. Note: it wouldn’t be enough to use PythonScript to load the huge file into Notepad++, because that would still have the full memory requirement, which you are running up against; you’d have to just use standard Python text-processing, trying to do it line-at-a-time or chunk-at-a-time, rather than whole-file-in-memory.)

    (edit: in case I wasn’t clear, the sed/gawk/perl/python solution would be outside of Notepad++; you could use NPP to write your script for one of those tools, and even use its run menu and similar functionality to launch the process… but it wouldn’t be doing the processing to the open files inside Notepad++, and wouldn’t be a solution that we’re really equipped to help you with, since this isn’t a general-programming forum.)

  • Regex single dot character in group behaves differently than not in group

    3
    0 Votes
    3 Posts
    1k Views
    guy038G

    Hi, @matthews-dylan and All,

    I apologize for my very late reply, but I needed to do numerous verifications and tests ! I’m going to start with some general topics, and, then, I’ll come back to your specific problem to tell you why your second regex ^(.)*$ matches empty lines only and I’ll give you a solution in order to delete any line which does not contain any Emoji character. Take your time and have a drink : this post is quite long ;-))

    First, I would say that most of the monospaced fonts, using in code editors, can display the glyphs of traditional characters only ! So, you need to get a more robust font, which could display most of Unicode symbols properly ;-))

    So, refer to the last section of my other post, below :

    https://community.notepad-plus-plus.org/post/50673

    Now, after pasting the input line of your post, with my current N++ Courier New font, I get the line, below, where your character, not handled with that font, is simply replaced with a small white square box :

    `Input line: □

    To get information in that character, refer, again, to the last section of this other post, which speaks about a very handy on-line UTF-8 tool :

    https://community.notepad-plus-plus.org/post/50983

    With the help of this tool, we deduce that your special char has the following characteristics :

    Character name SPLASHING SWEAT SYMBOL Hex code point 1F4A6 Decimal code point 128166 Hex UTF-8 bytes F0 9F 92 A6 Octal UTF-8 bytes 360 237 222 246 UTF-8 bytes as Latin-1 characters bytes ð <9F> <92> ¦ Hex UTF-16 Surrogates D83D DCA6

    Refer to the link, below, to see all the characters of the Unicode Miscellaneous Symbols and Pictographs block :

    http://www.unicode.org/charts/PDF/U1F300.pdf

    Note that the Unicode code-point of this character is 1F4A6, which is over the first 65536 characters of the Basic Multilingual Plane ( BMP ) Therefore, this means that :

    It is correctly encoded in an UTF-8 encoded file. So, you must use the N++ UTF-8 or UTF-8 BOM encodings, which can handle all Unicode characters, from \x{0000} to \x{10FFFF}

    It cannot be inserted in an ANSI encoded file, which handle 256 characters, only, from \x{00} to \x{FF}

    It cannot be inserted in a N++ UCS-2 BE BOM and UCS-2 LE BOM encoded file, which can handle only the 65536 characters of the BMP, from \x{0000} to \x{FFFF}

    Moreover, as the code-point of your character is over \x{FFFF} :

    It cannot be represented with the regex syntax \x{1F4A6}, due a bug of the present Boost regex engine, which does not handle all characters in true 32-bits encoding :-(( Also, searching for \x{1F4A6} results in the error message Find: Invalid regular expression

    The simple regex dot symbol . cannot match a character, with Unicode code-point > \x{FFFF}, too !

    Luckily, if you paste your character in the Find what: zone, it does find all occurrences of the SPLASHING SWEAT SYMBOL character !

    Now, the surrogates mechanism allows the UTF-16 encoding ( not used in Notepad++ ) to be able to code all characters with code-point over \x{FFFF}. Refer below :

    https://en.wikipedia.org/wiki/UTF-16#Description

    And I found out that if I write a regex, involving the surrogates pair ( 2 16-bit units ) of a character, which is over the BMP, the regex engine is able to match this character. For instance, as the surrogates pair of your character are : D83D DCA6, the regex \x{D83D}\x{DCA6} does find all occurrences of your SPLASHING SWEAT SYMBOL character !

    I’ve done a lot of tests and, unfortunately, using a similar syntax, to get any char, with code over \x{FFFF}, most of the regexes do not work.

    Indeed, as the high 16-bits surrogate belongs to the [\x{D800}-\x{DBFF}] range and the low 16-bits surrogate belongs to the [\x{DC00}-\x{DFFF}] range :

    The regex [\x{D800}-\x{DBFF}][\x{DC00}-\x{DFFF}] does not find any match

    The regex [\x{D800}-\x{DBFF}]\x{DCA6} does not find any match, too

    Luckily, the regex \x{D83D}[\x{DC00}-\x{DFFF}] does match your special 💦 character :-))

    So, in summary, because of the wrong handling of characters, in the present implementation of the Boost Regex library, within Notepad++ :

    To match any standard character, from \x{0000} to \x{FFFF} ( NOT EOL chars and the Form Feed char \x0c ), use the simple regex .

    To match any standard character from \x{10000} to \x{10FFFF}, use the regex .[\x{DC00}-\x{DFFF}] OR the shorter syntax ..

    To match all standard characters, from \x{0000} to \x{10FFFF}, use the regex .[\x{DC00}-\x{DFFF}]? OR the shorter syntax ..?

    And :

    To match a specific character of the BMP, from \x{0000} to \x{FFFF} use the regex syntax \x{....}, with four hexadecimal numbers

    To match a specific character over the BMP, from \x{10000} to \x{10FFFF}, use the high and low surrogates equivalent pair, with the regex syntax \x{<high>}\x{<low>}, replacing the <high> and <low> values with their exact hexadecimal values, using 4 hexadecimal numbers

    First example :

    From the list of chars, below : •----------------------------------•------------•-------•-------------------------•-------------------•--------------------------• | Character NAME | Code-Point | Char | In a UTF-8 encoded file | Hex-16 Surrogates | SEARCH Regex | •----------------------------------•------------•-------•-------------------------•-------------------•--------------------------• | LATIN CAPITAL LETTER A | 0041 | A | 41 | N/A | \x{0041} or . | | MATHEMATICAL BOLD CAPITAL A | 1D400 | 𝐀 | F0 9D 90 80 | D835 + DC00 | \x{D835}\x{DC00} or .. | | COMBINING GRAVE ACCENT BELOW | 0316 | ̖ | CC 96 | N/A | \x{0316} or . | | COMBINING LEFT ANGLE ABOVE | 031A | ̚ | CC 9A | N/A | \x{031A} or . | | MUSICAL SYMBOL COMBINING MARCATO | 1D17F | 𝅿 | F0 9D 85 BF | D834 + DD7F | \x{D834}\x{DD7F} or .. | •----------------------------------•------------•-------•-------------------------•-------------------•--------------------------• We may build up some COMPOSED characters, as below : •-----------------------•-------•-------------------------•----------------------------•--------------------------------------------• | Code-Points | Chars | In a UTF-8 encoded file | Hex-16 Surrogates | SEARCH Regex | •-----------------------•-------•-------------------------•----------------------------•--------------------------------------------• | 0041 + 031A | A̚ | 41 CC 9A | NO | \x{0041}\x{031A} or .. | | 0041 + 1D17F | A𝅿 | 41 F0 9D 85 BF | D834 + DD7F ( on 2nd char) | \x{0041}\x{D834}\x{DD7F} or ... | | 1D400 + 031A | 𝐀̚ | F0 9D 90 80 CC 9A | D835 + DC00 ( on 1st char) | \x{D835}\x{DC00}\x{031A} or ... | | 1D400 + 1D17F | 𝐀𝅿 | F0 9D 90 80 F0 9D 85 BF | D835 + DC00 + D834 + DD7F | \x{D835}\x{DC00}\x{D834}\x{DD7F} or .... | | 0041 + 1D17F + 031A | A𝅿̚ | 41 F0 9D 85 BF CC 9A | D834 + DD7F ( on 2nd char) | \x{0041}\x{D834}\x{DD7F}\x{031A} or .... | | 0041 + 031A + 1D17F | A𝅿̚ | 41 CC 9A F0 9D 85 BF | D834 + DD7F ( on 3rd char) | \x{0041}\x{031A}\x{D834}\x{DD7F} or .... | | 1D400 + 031A + 0316 | 𝐀̖̚ | F0 9D 90 80 CC 9A CC 96 | D835 + DC00 ( on 1st char) | \x{D835}\x{DC00}\x{031A}\x{0316} or .... | •-----------------------•-------•-------------------------•----------------------------•--------------------------------------------•

    Second example: If we use any of the 3 following regex S/R :

    SEARCH (?-s)^.+(.[\x{DC00}-\x{DFFF}]).+

    or :

    SEARCH (?-s)^.+\x20(..)\x20.+

    or :

    SEARCH (?-s)^.+(\x{D83D}\x{DCA6}).+

    and :

    REPLACE A necklace of the SPLASHING SWEAT SYMBOL ––\1––\1––\1––\1––\1––\1––\1––\1––\1––

    against the text This is the 💦 character, at the beginning a line, we get the resulting text :

    A necklace of the SPLASHING SWEAT SYMBOL ––💦––💦––💦––💦––💦––💦––💦––💦––💦––

    Now, let’s go back to your problem :

    Fundamentally, the problem arise because your special 💦 character can be matched with the regex .., only, regarding our present regex engine. It looks like, for these characters, the regex engine don’t see the character itself, but the two surrogate 16-bits code units !

    When you process the regex ^.*$ against your text : Input line: 💦, it does match the entire line, as the regex syntax .* means any number of chars ( . or .. or ..., and so on )

    Now, let’s consider the following regex syntaxes, with a capturing group 1, against this 4-lines text, pasted in a new tab :

    💦 Input line: 💦

    Note that the 1st and 3rd line are empty, the 2nd line contains your 💦 special char, only and the 4th line ends with that special char

    Regarding the following regex examples, below, you may test them, using the -->\1<-- Replace zone

    Before, a quick remainder :

    The INPUT text : 167844894321 16784 4566499 with the regex S/R : SEARCH (\d)+ REPLACE -->\1<-- would result in : -->1<-- -->4<-- -->9<--

    As you can see, group 1 always contains the last stored value of the group. So, the regex could also have been rewritten as \d+(\d)

    The regex ^(.)$ cannot find anything, as no character, with code <= \x{FFFF}, exists between beginning and end of line

    The regex ^(..)$ does find, in line 2, your 💦 special character, with code > \x{FFFF}, between beginning and end of line

    Your regex ^(.)*$ simply matches the true empty lines 1 and 3. WHY ?
    Well, as the group contains only one dot ., it cannot match your last 💦 special character, in line 2 and 4, which needs to be considered as a pseudo two-chars entity. So the overall regex fails, in these lines !

    The regex ^(..)*$ does match all the lines of the subject text, because, luckily, the part Input line:, followed with a space char, is exactly 12 chars long, so an even number ! And the last value of group 1 is your 2-chars 💦 special char, right before the end of the line

    Notes :

    The regex ^.*(..)$ would match all the non-empty lines 2 and 4, because group 1, .., represents your 💦 special char, ending these lines

    And the regex ^(?:..){6}(..)$ would match the line 4, only

    The regex ^.............(.)$ does not work properly, because group1 does not contain the 💦 special character ( See after the replacement ! )

    On the contrary, the regex ^............(..)$ does find all contents of line 4, as the group 1, .., contains, exactly, the 💦 special character

    On the other hand :

    The regex ^(.)* selects as many standard characters, with code-point <= \x{FFFF}, so the following strings, but NOT your LAST 💦 special character !

    The null string before your 💦 special char, in line 2

    The string Input line:, followed with a space char, in line 4

    And, finally :

    The two regexes (.*)$ and (.*), with group 1 selecting all line contents, would match the four lines

    Now, your last goal : let’s suppose that you would like to delete any line, which does not contain any Unicode Emojis character :

    First, from that link :

    http://www.unicode.org/charts/PDF/U1F600.pdf

    We learn that the Unicode Emoticons block have code-points between \x{1F600} and \x{1F64F}

    With the on-line UTF-8 toll, we verify that the two Hex UTF-16 surrogates are :

    D83D DE00, for the \x{1F600} emoticon

    D83D DE4F, for the \x{1F64F} emoticon

    So, we should match all the characters of the Unicode Emoticons block, with the search regex :

    SEARCH \x{D83D}[\x{DE00}-\x{DE4F}]

    And, yes, it does work as expected. In that case, deleting any non-empty line which does not contain any Emoticon character(s) is easy with the following regex S/R :

    SEARCH (?-s)^(?!.*\x{D83D}[\x{DE00}-\x{DE4F}]).+\R

    REPLACE Leave EMPTY

    In contrast, the regex S/R :

    SEARCH (?-s)^(?=.*\x{D83D}[\x{DE00}-\x{DE4F}]).+\R

    REPLACE Leave EMPTY

    would delete any non-empty line containing one or more emoticon character(s) !

    Not asleep yet ? That’s good news :-))

    Best Regards,

    guy038

    P.S. :

    Let’s suppose that, instead of the small Unicode Emoticons block, containing 80 characters, we would like to search for any character belonging to the Unicode Miscellaneous Symbols and Pictographs block, which contains 768 characters and where your special 💦 char takes place

    Right now, it’s getting really inextricable ! The Unicode range of that block is from \x{1F300} to \x{1F5FF}, but, because of the surrogates mechanism, it must be split in two parts :

    The range of chars between \x{1F300} and \x{1F3FF}, so with surrogates pairs D83C DF00 to D83C DFFF

    The range of chars between \x{1F400} and \x{1F5FF}, so with surrogates pairs D83D DC00 to D83D DDFF

    Therefore, the correct regex to match all the characters of this block is, indeed :

    \x{D83C}[\x{DF00}-\x{DFFF}]|\x{D83D}[\x{DC00}-\x{DDFF}]

    with an alternative between two regexes, in order to match each subset !

    I confirm that this regex does find the 768 characters of the Unicode Miscellaneous Symbols and Pictographs block, with code-point over \x{FFFF} !

    It’s really a pity that the N++ regex engine does not handle correctly all the characters outside the BMP. If so, we just would have to simply use the classical [\x{1F300}-\x{1F5FF}] character class !!