TextFX 64-bit binary / source code unavailability?
-
Thanks for the info. I downloaded the 0.26 but didn’t realize that the older version had source code in it.
Anyway, I’ll give this a go.
Regards,
Jari Turkia -
Is TextFX all that valuable anymore? Looking through its functions, I don’t see a lot there that can’t be done in other, more intuitive ways. And TextFX is rather clunky, starting with how it seems to feel it is too good to be installed with the other plugins under the plugin menu. I guess I’m asking, do people really use this plugin still, and if so what does it provide that you can’t get in other ways?
-
TextFX has several functions I haven’t been able to find elsewhere:
- Sentence case
- Rot13
- Delete (surplus) blank lines (both ways)
- Unwrap text (could be improved to handle hyphenation and detecting paragraph breaks)
- Strip HTML tags table (non)tabs (I never noticed a difference here between tabs and nontabs)
- Tools -> Sort lines case (in)sensitive at column
- Tools -> Base64 Decode
If you know of any (easy, quick, intuitive) way of doing these things I would be happy to know.
-
Hello, @mathnerd314, and All
I agree, Mathnerd314, that the old TextFX plugin may, still, be helpful, for specific actions.
However, among your list of some ( beloved ) features, some may be achieved with native N++ features and regexes, of course ;-))
Note : For all the described S/R, below, you must check the Regular expression search mode, in the Replace dialog
A)
Delete (surplus) blank lines (both ways) : Very easy, with simple regexes :-
To delete all PURE blank line : SEARCH
^\R
, REPLACEEMPTY
-
To delete all blank lines, containing, only, Horizontal Space characters : SEARCH
^\h+\R
, REPLACEEMPTY
-
To delete all Horizontal Space characters, at END of lines : SEARCH
\h+$
, REPLACEEMPTY
-
To delete all surplus PURE blank lines : SEARCH
(^\R)\1+
, REPLACE\1
-
To delete all surplus blank lines, with possible Horizontal Space characters / SEARCH
(^\h*\R)(?1)+
, REPLACE\1
B) Sentence case :
After numerous tests, it happens that TextFX, changes, in UPPER-case, any lower-case letter, according to the regexes below :
-
SEARCH
(?-i)\.[\t\n\x0b\f\r\x20\x85\x91]+\K\l
, REPLACE =\u$0
, for an ANSI encoded file , and -
SEARCH
(?-i)\.[\t\n\x0b\f\r\x20\x{2026}\x{2018}]+\K\l
, REPLACE =\u$0
, for an UNICODE encoded file
Personally, I think that a better algorithm would be :
- SEARCH
(?-i)(\.\s+|\v+)\K\l
, REPLACE\u$0
, whatever the encoding
So, if my regex find, either :
-
A literal
dot
, followed with any non-empty range of horizontal / vertical space characters, preceding a lower-case letter -
Any non-empty range of vertical space character, preceding a lower-case letter
It changes that lower-case letter, in the corresponding UPPER-case letter
Remark : For a correct replacement, due to the
\K
syntax, you MUST click on the Replace All button, exclusively ! ( Don’t use the Replace button ! )Note that with the original text, with lower-case letters, only :
this is a test. beginning of test this is a test. bla bla bla this is a test. end of the text
Applying the option TextFX > TextFX Characters > Sentence case., it becomes :
This is a test. Beginning of test this is a test. Bla bla bla this is a test. End of the text
To my mind, with my regex, the text, below, with upper-case letters, at beginning of all lines, looks nicer !
This is a test. Beginning of test This is a test. Bla bla bla This is a test. End of the text
C) Unwrap text (could be improved to handle hyphenation and detecting paragraph breaks)
So, giving the original text, below :
The licenses for most software are designed to take away your freedom to share and chan- ge it. By contrast, the GNU General Public License is intended to guarantee your free- dom to share and change free software.
With the option TextFX > TextFX Edit > Unwrap text, you get :
The licenses for most software are designed to take away your freedom to share and chan- ge it. By contrast, the GNU General Public License is intended to guarantee your free- dom to share and change free software.
But, with the regex S/R :
SEARCH
-\R|(\R+)
REPLACE
?1\x20
You would get, the expected text ( with handling of hyphens and line-breaks ! )
The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software.
D) Tools -> Sort lines case (in)sensitive at column
OK, let’s start with the table, below, listing some properties of some Unicode characters, between
\x0000
and\x007F
.I shortened that list, getting rid of letters between N and Z, digits from 5 to 9 and some control characters, in order to get a complete post, of less than 16,284 bytes ( the limit ! )
| 0000 | <control> | Cc | BN | N | | 0001 | <control> | Cc | BN | N | | 0009 | <control> | Cc | S | N | | 000A | <control> | Cc | B | N | | 000B | <control> | Cc | S | N | | 000C | <control> | Cc | WS | N | | 000D | <control> | Cc | B | N | | 001A | <control> | Cc | BN | N | | 001B | <control> | Cc | BN | N | | 001C | <control> | Cc | B | N | | 001E | <control> | Cc | B | N | | 001F | <control> | Cc | S | N | | 0020 | SPACE | Zs | WS | N | | 0021 | EXCLAMATION MARK | Po | ON | N | | 0022 | QUOTATION MARK | Po | ON | N | | 0023 | NUMBER SIGN | Po | ET | N | | 0024 | DOLLAR SIGN | Sc | ET | N | | 0025 | PERCENT SIGN | Po | ET | N | | 0026 | AMPERSAND | Po | ON | N | | 0027 | APOSTROPHE | Po | ON | N | | 0028 | LEFT PARENTHESIS | Ps | ON | Y | | 0029 | RIGHT PARENTHESIS | Pe | ON | Y | | 002A | ASTERISK | Po | ON | N | | 002B | PLUS SIGN | Sm | ES | N | | 002C | COMMA | Po | CS | N | | 002D | HYPHEN-MINUS | Pd | ES | N | | 002E | FULL STOP | Po | CS | N | | 002F | SOLIDUS | Po | CS | N | | 0030 | DIGIT ZERO | Nd | EN | N | | 0031 | DIGIT ONE | Nd | EN | N | | 0032 | DIGIT TWO | Nd | EN | N | | 0033 | DIGIT THREE | Nd | EN | N | | 0034 | DIGIT FOUR | Nd | EN | N | | 003A | COLON | Po | CS | N | | 003B | SEMICOLON | Po | ON | N | | 003C | LESS-THAN SIGN | Sm | ON | Y | | 003D | EQUALS SIGN | Sm | ON | N | | 003E | GREATER-THAN SIGN | Sm | ON | Y | | 003F | QUESTION MARK | Po | ON | N | | 0040 | COMMERCIAL AT | Po | ON | N | | 0041 | LATIN CAPITAL LETTER A | Lu | L | N | | 0042 | LATIN CAPITAL LETTER B | Lu | L | N | | 0043 | LATIN CAPITAL LETTER C | Lu | L | N | | 0044 | LATIN CAPITAL LETTER D | Lu | L | N | | 0045 | LATIN CAPITAL LETTER E | Lu | L | N | | 0046 | LATIN CAPITAL LETTER F | Lu | L | N | | 0047 | LATIN CAPITAL LETTER G | Lu | L | N | | 0048 | LATIN CAPITAL LETTER H | Lu | L | N | | 0049 | LATIN CAPITAL LETTER I | Lu | L | N | | 004A | LATIN CAPITAL LETTER J | Lu | L | N | | 004B | LATIN CAPITAL LETTER K | Lu | L | N | | 004C | LATIN CAPITAL LETTER L | Lu | L | N | | 004D | LATIN CAPITAL LETTER M | Lu | L | N | | 005B | LEFT SQUARE BRACKET | Ps | ON | Y | | 005C | REVERSE SOLIDUS | Po | ON | N | | 005D | RIGHT SQUARE BRACKET | Pe | ON | Y | | 005E | CIRCUMFLEX ACCENT | Sk | ON | N | | 005F | LOW LINE | Pc | ON | N | | 0060 | GRAVE ACCENT | Sk | ON | N | | 0061 | LATIN SMALL LETTER A | Ll | L | N | | 0062 | LATIN SMALL LETTER B | Ll | L | N | | 0063 | LATIN SMALL LETTER C | Ll | L | N | | 0064 | LATIN SMALL LETTER D | Ll | L | N | | 0065 | LATIN SMALL LETTER E | Ll | L | N | | 0066 | LATIN SMALL LETTER F | Ll | L | N | | 0067 | LATIN SMALL LETTER G | Ll | L | N | | 0068 | LATIN SMALL LETTER H | Ll | L | N | | 0069 | LATIN SMALL LETTER I | Ll | L | N | | 006A | LATIN SMALL LETTER J | Ll | L | N | | 006B | LATIN SMALL LETTER K | Ll | L | N | | 006C | LATIN SMALL LETTER L | Ll | L | N | | 006D | LATIN SMALL LETTER M | Ll | L | N | | 007B | LEFT CURLY BRACKET | Ps | ON | Y | | 007C | VERTICAL LINE | Sm | ON | N | | 007D | RIGHT CURLY BRACKET | Pe | ON | Y | | 007E | TILDE | Sm | ON | N | | 007F | <control> | Cc | BN | N |
Using the TextFX option TextFX > TextFX Tools > Sort lines case sensitive ( at column
53
), we get :| 001E | <control> | Cc | B | N | | 000D | <control> | Cc | B | N | | 001C | <control> | Cc | B | N | | 000A | <control> | Cc | B | N | | 0001 | <control> | Cc | BN | N | | 0000 | <control> | Cc | BN | N | | 007F | <control> | Cc | BN | N | | 001A | <control> | Cc | BN | N | | 001B | <control> | Cc | BN | N | | 000B | <control> | Cc | S | N | | 0009 | <control> | Cc | S | N | | 001F | <control> | Cc | S | N | | 000C | <control> | Cc | WS | N | | 006B | LATIN SMALL LETTER K | Ll | L | N | | 006C | LATIN SMALL LETTER L | Ll | L | N | | 006A | LATIN SMALL LETTER J | Ll | L | N | | 0068 | LATIN SMALL LETTER H | Ll | L | N | | 0067 | LATIN SMALL LETTER G | Ll | L | N | | 0066 | LATIN SMALL LETTER F | Ll | L | N | | 0065 | LATIN SMALL LETTER E | Ll | L | N | | 0064 | LATIN SMALL LETTER D | Ll | L | N | | 0063 | LATIN SMALL LETTER C | Ll | L | N | | 0062 | LATIN SMALL LETTER B | Ll | L | N | | 0061 | LATIN SMALL LETTER A | Ll | L | N | | 006D | LATIN SMALL LETTER M | Ll | L | N | | 0069 | LATIN SMALL LETTER I | Ll | L | N | | 0042 | LATIN CAPITAL LETTER B | Lu | L | N | | 004A | LATIN CAPITAL LETTER J | Lu | L | N | | 0048 | LATIN CAPITAL LETTER H | Lu | L | N | | 0047 | LATIN CAPITAL LETTER G | Lu | L | N | | 0043 | LATIN CAPITAL LETTER C | Lu | L | N | | 0041 | LATIN CAPITAL LETTER A | Lu | L | N | | 004C | LATIN CAPITAL LETTER L | Lu | L | N | | 0049 | LATIN CAPITAL LETTER I | Lu | L | N | | 0046 | LATIN CAPITAL LETTER F | Lu | L | N | | 0045 | LATIN CAPITAL LETTER E | Lu | L | N | | 0044 | LATIN CAPITAL LETTER D | Lu | L | N | | 004B | LATIN CAPITAL LETTER K | Lu | L | N | | 004D | LATIN CAPITAL LETTER M | Lu | L | N | | 0034 | DIGIT FOUR | Nd | EN | N | | 0033 | DIGIT THREE | Nd | EN | N | | 0032 | DIGIT TWO | Nd | EN | N | | 0031 | DIGIT ONE | Nd | EN | N | | 0030 | DIGIT ZERO | Nd | EN | N | | 005F | LOW LINE | Pc | ON | N | | 002D | HYPHEN-MINUS | Pd | ES | N | | 0029 | RIGHT PARENTHESIS | Pe | ON | Y | | 005D | RIGHT SQUARE BRACKET | Pe | ON | Y | | 007D | RIGHT CURLY BRACKET | Pe | ON | Y | | 003A | COLON | Po | CS | N | | 002E | FULL STOP | Po | CS | N | | 002F | SOLIDUS | Po | CS | N | | 002C | COMMA | Po | CS | N | | 0025 | PERCENT SIGN | Po | ET | N | | 0023 | NUMBER SIGN | Po | ET | N | | 005C | REVERSE SOLIDUS | Po | ON | N | | 0040 | COMMERCIAL AT | Po | ON | N | | 003B | SEMICOLON | Po | ON | N | | 0021 | EXCLAMATION MARK | Po | ON | N | | 003F | QUESTION MARK | Po | ON | N | | 0022 | QUOTATION MARK | Po | ON | N | | 002A | ASTERISK | Po | ON | N | | 0026 | AMPERSAND | Po | ON | N | | 0027 | APOSTROPHE | Po | ON | N | | 0028 | LEFT PARENTHESIS | Ps | ON | Y | | 007B | LEFT CURLY BRACKET | Ps | ON | Y | | 005B | LEFT SQUARE BRACKET | Ps | ON | Y | | 0024 | DOLLAR SIGN | Sc | ET | N | | 005E | CIRCUMFLEX ACCENT | Sk | ON | N | | 0060 | GRAVE ACCENT | Sk | ON | N | | 002B | PLUS SIGN | Sm | ES | N | | 003D | EQUALS SIGN | Sm | ON | N | | 007E | TILDE | Sm | ON | N | | 007C | VERTICAL LINE | Sm | ON | N | | 003C | LESS-THAN SIGN | Sm | ON | Y | | 003E | GREATER-THAN SIGN | Sm | ON | Y | | 0020 | SPACE | Zs | WS | N |
Remark : You, certainly, noticed that, despite of the right sort of text, after column
53
, the original order of the list is NOT preserved, unfortunately :-((Continuation, in next post !
guy038
-
-
Hello, @mathnerd314, and All
Now, let’s imagine that we copy the original text, after column
53
, at the very beginning of each line, with the regex S/R, below :SEARCH
(?-s)^.{52}(.+)(?=\|)
REPLACE
\1$0
We obtain the changed text :
| Cc | BN | N | 0000 | <control> | Cc | BN | N | | Cc | BN | N | 0001 | <control> | Cc | BN | N | | Cc | S | N | 0009 | <control> | Cc | S | N | | Cc | B | N | 000A | <control> | Cc | B | N | | Cc | S | N | 000B | <control> | Cc | S | N | | Cc | WS | N | 000C | <control> | Cc | WS | N | | Cc | B | N | 000D | <control> | Cc | B | N | | Cc | BN | N | 001A | <control> | Cc | BN | N | | Cc | BN | N | 001B | <control> | Cc | BN | N | | Cc | B | N | 001C | <control> | Cc | B | N | | Cc | B | N | 001E | <control> | Cc | B | N | | Cc | S | N | 001F | <control> | Cc | S | N | | Zs | WS | N | 0020 | SPACE | Zs | WS | N | | Po | ON | N | 0021 | EXCLAMATION MARK | Po | ON | N | | Po | ON | N | 0022 | QUOTATION MARK | Po | ON | N | | Po | ET | N | 0023 | NUMBER SIGN | Po | ET | N | | Sc | ET | N | 0024 | DOLLAR SIGN | Sc | ET | N | | Po | ET | N | 0025 | PERCENT SIGN | Po | ET | N | | Po | ON | N | 0026 | AMPERSAND | Po | ON | N | | Po | ON | N | 0027 | APOSTROPHE | Po | ON | N | | Ps | ON | Y | 0028 | LEFT PARENTHESIS | Ps | ON | Y | | Pe | ON | Y | 0029 | RIGHT PARENTHESIS | Pe | ON | Y | | Po | ON | N | 002A | ASTERISK | Po | ON | N | | Sm | ES | N | 002B | PLUS SIGN | Sm | ES | N | | Po | CS | N | 002C | COMMA | Po | CS | N | | Pd | ES | N | 002D | HYPHEN-MINUS | Pd | ES | N | | Po | CS | N | 002E | FULL STOP | Po | CS | N | | Po | CS | N | 002F | SOLIDUS | Po | CS | N | | Nd | EN | N | 0030 | DIGIT ZERO | Nd | EN | N | | Nd | EN | N | 0031 | DIGIT ONE | Nd | EN | N | | Nd | EN | N | 0032 | DIGIT TWO | Nd | EN | N | | Nd | EN | N | 0033 | DIGIT THREE | Nd | EN | N | | Nd | EN | N | 0034 | DIGIT FOUR | Nd | EN | N | | Po | CS | N | 003A | COLON | Po | CS | N | | Po | ON | N | 003B | SEMICOLON | Po | ON | N | | Sm | ON | Y | 003C | LESS-THAN SIGN | Sm | ON | Y | | Sm | ON | N | 003D | EQUALS SIGN | Sm | ON | N | | Sm | ON | Y | 003E | GREATER-THAN SIGN | Sm | ON | Y | | Po | ON | N | 003F | QUESTION MARK | Po | ON | N | | Po | ON | N | 0040 | COMMERCIAL AT | Po | ON | N | | Lu | L | N | 0041 | LATIN CAPITAL LETTER A | Lu | L | N | | Lu | L | N | 0042 | LATIN CAPITAL LETTER B | Lu | L | N | | Lu | L | N | 0043 | LATIN CAPITAL LETTER C | Lu | L | N | | Lu | L | N | 0044 | LATIN CAPITAL LETTER D | Lu | L | N | | Lu | L | N | 0045 | LATIN CAPITAL LETTER E | Lu | L | N | | Lu | L | N | 0046 | LATIN CAPITAL LETTER F | Lu | L | N | | Lu | L | N | 0047 | LATIN CAPITAL LETTER G | Lu | L | N | | Lu | L | N | 0048 | LATIN CAPITAL LETTER H | Lu | L | N | | Lu | L | N | 0049 | LATIN CAPITAL LETTER I | Lu | L | N | | Lu | L | N | 004A | LATIN CAPITAL LETTER J | Lu | L | N | | Lu | L | N | 004B | LATIN CAPITAL LETTER K | Lu | L | N | | Lu | L | N | 004C | LATIN CAPITAL LETTER L | Lu | L | N | | Lu | L | N | 004D | LATIN CAPITAL LETTER M | Lu | L | N | | Ps | ON | Y | 005B | LEFT SQUARE BRACKET | Ps | ON | Y | | Po | ON | N | 005C | REVERSE SOLIDUS | Po | ON | N | | Pe | ON | Y | 005D | RIGHT SQUARE BRACKET | Pe | ON | Y | | Sk | ON | N | 005E | CIRCUMFLEX ACCENT | Sk | ON | N | | Pc | ON | N | 005F | LOW LINE | Pc | ON | N | | Sk | ON | N | 0060 | GRAVE ACCENT | Sk | ON | N | | Ll | L | N | 0061 | LATIN SMALL LETTER A | Ll | L | N | | Ll | L | N | 0062 | LATIN SMALL LETTER B | Ll | L | N | | Ll | L | N | 0063 | LATIN SMALL LETTER C | Ll | L | N | | Ll | L | N | 0064 | LATIN SMALL LETTER D | Ll | L | N | | Ll | L | N | 0065 | LATIN SMALL LETTER E | Ll | L | N | | Ll | L | N | 0066 | LATIN SMALL LETTER F | Ll | L | N | | Ll | L | N | 0067 | LATIN SMALL LETTER G | Ll | L | N | | Ll | L | N | 0068 | LATIN SMALL LETTER H | Ll | L | N | | Ll | L | N | 0069 | LATIN SMALL LETTER I | Ll | L | N | | Ll | L | N | 006A | LATIN SMALL LETTER J | Ll | L | N | | Ll | L | N | 006B | LATIN SMALL LETTER K | Ll | L | N | | Ll | L | N | 006C | LATIN SMALL LETTER L | Ll | L | N | | Ll | L | N | 006D | LATIN SMALL LETTER M | Ll | L | N | | Ps | ON | Y | 007B | LEFT CURLY BRACKET | Ps | ON | Y | | Sm | ON | N | 007C | VERTICAL LINE | Sm | ON | N | | Pe | ON | Y | 007D | RIGHT CURLY BRACKET | Pe | ON | Y | | Sm | ON | N | 007E | TILDE | Sm | ON | N | | Cc | BN | N | 007F | <control> | Cc | BN | N |
Then, selecting the above table and using the N++ option Edit > Line Operations > Sort Lines Lexicographically Ascending, this table becomes :
| Cc | B | N | 000A | <control> | Cc | B | N | | Cc | B | N | 000D | <control> | Cc | B | N | | Cc | B | N | 001C | <control> | Cc | B | N | | Cc | B | N | 001E | <control> | Cc | B | N | | Cc | BN | N | 0000 | <control> | Cc | BN | N | | Cc | BN | N | 0001 | <control> | Cc | BN | N | | Cc | BN | N | 001A | <control> | Cc | BN | N | | Cc | BN | N | 001B | <control> | Cc | BN | N | | Cc | BN | N | 007F | <control> | Cc | BN | N | | Cc | S | N | 0009 | <control> | Cc | S | N | | Cc | S | N | 000B | <control> | Cc | S | N | | Cc | S | N | 001F | <control> | Cc | S | N | | Cc | WS | N | 000C | <control> | Cc | WS | N | | Ll | L | N | 0061 | LATIN SMALL LETTER A | Ll | L | N | | Ll | L | N | 0062 | LATIN SMALL LETTER B | Ll | L | N | | Ll | L | N | 0063 | LATIN SMALL LETTER C | Ll | L | N | | Ll | L | N | 0064 | LATIN SMALL LETTER D | Ll | L | N | | Ll | L | N | 0065 | LATIN SMALL LETTER E | Ll | L | N | | Ll | L | N | 0066 | LATIN SMALL LETTER F | Ll | L | N | | Ll | L | N | 0067 | LATIN SMALL LETTER G | Ll | L | N | | Ll | L | N | 0068 | LATIN SMALL LETTER H | Ll | L | N | | Ll | L | N | 0069 | LATIN SMALL LETTER I | Ll | L | N | | Ll | L | N | 006A | LATIN SMALL LETTER J | Ll | L | N | | Ll | L | N | 006B | LATIN SMALL LETTER K | Ll | L | N | | Ll | L | N | 006C | LATIN SMALL LETTER L | Ll | L | N | | Ll | L | N | 006D | LATIN SMALL LETTER M | Ll | L | N | | Lu | L | N | 0041 | LATIN CAPITAL LETTER A | Lu | L | N | | Lu | L | N | 0042 | LATIN CAPITAL LETTER B | Lu | L | N | | Lu | L | N | 0043 | LATIN CAPITAL LETTER C | Lu | L | N | | Lu | L | N | 0044 | LATIN CAPITAL LETTER D | Lu | L | N | | Lu | L | N | 0045 | LATIN CAPITAL LETTER E | Lu | L | N | | Lu | L | N | 0046 | LATIN CAPITAL LETTER F | Lu | L | N | | Lu | L | N | 0047 | LATIN CAPITAL LETTER G | Lu | L | N | | Lu | L | N | 0048 | LATIN CAPITAL LETTER H | Lu | L | N | | Lu | L | N | 0049 | LATIN CAPITAL LETTER I | Lu | L | N | | Lu | L | N | 004A | LATIN CAPITAL LETTER J | Lu | L | N | | Lu | L | N | 004B | LATIN CAPITAL LETTER K | Lu | L | N | | Lu | L | N | 004C | LATIN CAPITAL LETTER L | Lu | L | N | | Lu | L | N | 004D | LATIN CAPITAL LETTER M | Lu | L | N | | Nd | EN | N | 0030 | DIGIT ZERO | Nd | EN | N | | Nd | EN | N | 0031 | DIGIT ONE | Nd | EN | N | | Nd | EN | N | 0032 | DIGIT TWO | Nd | EN | N | | Nd | EN | N | 0033 | DIGIT THREE | Nd | EN | N | | Nd | EN | N | 0034 | DIGIT FOUR | Nd | EN | N | | Pc | ON | N | 005F | LOW LINE | Pc | ON | N | | Pd | ES | N | 002D | HYPHEN-MINUS | Pd | ES | N | | Pe | ON | Y | 0029 | RIGHT PARENTHESIS | Pe | ON | Y | | Pe | ON | Y | 005D | RIGHT SQUARE BRACKET | Pe | ON | Y | | Pe | ON | Y | 007D | RIGHT CURLY BRACKET | Pe | ON | Y | | Po | CS | N | 002C | COMMA | Po | CS | N | | Po | CS | N | 002E | FULL STOP | Po | CS | N | | Po | CS | N | 002F | SOLIDUS | Po | CS | N | | Po | CS | N | 003A | COLON | Po | CS | N | | Po | ET | N | 0023 | NUMBER SIGN | Po | ET | N | | Po | ET | N | 0025 | PERCENT SIGN | Po | ET | N | | Po | ON | N | 0021 | EXCLAMATION MARK | Po | ON | N | | Po | ON | N | 0022 | QUOTATION MARK | Po | ON | N | | Po | ON | N | 0026 | AMPERSAND | Po | ON | N | | Po | ON | N | 0027 | APOSTROPHE | Po | ON | N | | Po | ON | N | 002A | ASTERISK | Po | ON | N | | Po | ON | N | 003B | SEMICOLON | Po | ON | N | | Po | ON | N | 003F | QUESTION MARK | Po | ON | N | | Po | ON | N | 0040 | COMMERCIAL AT | Po | ON | N | | Po | ON | N | 005C | REVERSE SOLIDUS | Po | ON | N | | Ps | ON | Y | 0028 | LEFT PARENTHESIS | Ps | ON | Y | | Ps | ON | Y | 005B | LEFT SQUARE BRACKET | Ps | ON | Y | | Ps | ON | Y | 007B | LEFT CURLY BRACKET | Ps | ON | Y | | Sc | ET | N | 0024 | DOLLAR SIGN | Sc | ET | N | | Sk | ON | N | 005E | CIRCUMFLEX ACCENT | Sk | ON | N | | Sk | ON | N | 0060 | GRAVE ACCENT | Sk | ON | N | | Sm | ES | N | 002B | PLUS SIGN | Sm | ES | N | | Sm | ON | N | 003D | EQUALS SIGN | Sm | ON | N | | Sm | ON | N | 007C | VERTICAL LINE | Sm | ON | N | | Sm | ON | N | 007E | TILDE | Sm | ON | N | | Sm | ON | Y | 003C | LESS-THAN SIGN | Sm | ON | Y | | Sm | ON | Y | 003E | GREATER-THAN SIGN | Sm | ON | Y | | Zs | WS | N | 0020 | SPACE | Zs | WS | N |
Note : This time, when the three first columns, sorted, are identical, there’s an implicit sort, on the fourth column ( code-point, witch is unique ). So, this Unicode list keeps its original order :-))
Continuation, in next post !
guy038
-
Hello, @mathnerd314, and All
To end with, we just have to delete these three temporary columns, at beginning of each line. This can be, easily, done with the regex S/R, below :
SEARCH
(?-s)^.{14}(?=\|)
REPLACE
EMPTY
| 000A | <control> | Cc | B | N | | 000D | <control> | Cc | B | N | | 001C | <control> | Cc | B | N | | 001E | <control> | Cc | B | N | | 0000 | <control> | Cc | BN | N | | 0001 | <control> | Cc | BN | N | | 001A | <control> | Cc | BN | N | | 001B | <control> | Cc | BN | N | | 007F | <control> | Cc | BN | N | | 0009 | <control> | Cc | S | N | | 000B | <control> | Cc | S | N | | 001F | <control> | Cc | S | N | | 000C | <control> | Cc | WS | N | | 0061 | LATIN SMALL LETTER A | Ll | L | N | | 0062 | LATIN SMALL LETTER B | Ll | L | N | | 0063 | LATIN SMALL LETTER C | Ll | L | N | | 0064 | LATIN SMALL LETTER D | Ll | L | N | | 0065 | LATIN SMALL LETTER E | Ll | L | N | | 0066 | LATIN SMALL LETTER F | Ll | L | N | | 0067 | LATIN SMALL LETTER G | Ll | L | N | | 0068 | LATIN SMALL LETTER H | Ll | L | N | | 0069 | LATIN SMALL LETTER I | Ll | L | N | | 006A | LATIN SMALL LETTER J | Ll | L | N | | 006B | LATIN SMALL LETTER K | Ll | L | N | | 006C | LATIN SMALL LETTER L | Ll | L | N | | 006D | LATIN SMALL LETTER M | Ll | L | N | | 0041 | LATIN CAPITAL LETTER A | Lu | L | N | | 0042 | LATIN CAPITAL LETTER B | Lu | L | N | | 0043 | LATIN CAPITAL LETTER C | Lu | L | N | | 0044 | LATIN CAPITAL LETTER D | Lu | L | N | | 0045 | LATIN CAPITAL LETTER E | Lu | L | N | | 0046 | LATIN CAPITAL LETTER F | Lu | L | N | | 0047 | LATIN CAPITAL LETTER G | Lu | L | N | | 0048 | LATIN CAPITAL LETTER H | Lu | L | N | | 0049 | LATIN CAPITAL LETTER I | Lu | L | N | | 004A | LATIN CAPITAL LETTER J | Lu | L | N | | 004B | LATIN CAPITAL LETTER K | Lu | L | N | | 004C | LATIN CAPITAL LETTER L | Lu | L | N | | 004D | LATIN CAPITAL LETTER M | Lu | L | N | | 0030 | DIGIT ZERO | Nd | EN | N | | 0031 | DIGIT ONE | Nd | EN | N | | 0032 | DIGIT TWO | Nd | EN | N | | 0033 | DIGIT THREE | Nd | EN | N | | 0034 | DIGIT FOUR | Nd | EN | N | | 005F | LOW LINE | Pc | ON | N | | 002D | HYPHEN-MINUS | Pd | ES | N | | 0029 | RIGHT PARENTHESIS | Pe | ON | Y | | 005D | RIGHT SQUARE BRACKET | Pe | ON | Y | | 007D | RIGHT CURLY BRACKET | Pe | ON | Y | | 002C | COMMA | Po | CS | N | | 002E | FULL STOP | Po | CS | N | | 002F | SOLIDUS | Po | CS | N | | 003A | COLON | Po | CS | N | | 0023 | NUMBER SIGN | Po | ET | N | | 0025 | PERCENT SIGN | Po | ET | N | | 0021 | EXCLAMATION MARK | Po | ON | N | | 0022 | QUOTATION MARK | Po | ON | N | | 0026 | AMPERSAND | Po | ON | N | | 0027 | APOSTROPHE | Po | ON | N | | 002A | ASTERISK | Po | ON | N | | 003B | SEMICOLON | Po | ON | N | | 003F | QUESTION MARK | Po | ON | N | | 0040 | COMMERCIAL AT | Po | ON | N | | 005C | REVERSE SOLIDUS | Po | ON | N | | 0028 | LEFT PARENTHESIS | Ps | ON | Y | | 005B | LEFT SQUARE BRACKET | Ps | ON | Y | | 007B | LEFT CURLY BRACKET | Ps | ON | Y | | 0024 | DOLLAR SIGN | Sc | ET | N | | 005E | CIRCUMFLEX ACCENT | Sk | ON | N | | 0060 | GRAVE ACCENT | Sk | ON | N | | 002B | PLUS SIGN | Sm | ES | N | | 003D | EQUALS SIGN | Sm | ON | N | | 007C | VERTICAL LINE | Sm | ON | N | | 007E | TILDE | Sm | ON | N | | 003C | LESS-THAN SIGN | Sm | ON | Y | | 003E | GREATER-THAN SIGN | Sm | ON | Y | | 0020 | SPACE | Zs | WS | N |
Best Regards,
guy038
-
@Mathnerd314 , @guy038 :
Guy must have gotten tired typing that last series of posts and needed a long sleep, otherwise I have no explanation for why he did not tackle the regex-based solution for “Rot13”.
:-DA good application of the technique discussed by Guy here, if I haven’t made any typos, here is how to find-and-replace to do a Rot13 substitution cypher:
Find-what:
(?-i)(A)|(B)|(C)|(D)|(E)|(F)|(G)|(H)|(I)|(J)|(K)|(L)|(M)|(N)|(O)|(P)|(Q)|(R)|(S)|(T)|(U)|(V)|(W)|(X)|(Y)|(Z)|(a)|(b)|(c)|(d)|(e)|(f)|(g)|(h)|(i)|(j)|(k)|(l)|(m)|(n)|(o)|(p)|(q)|(r)|(s)|(t)|(u)|(v)|(w)|(x)|(y)|(z)
Replace-with:
(?1N)(?2O)(?3P)(?4Q)(?5R)(?6S)(?7T)(?8U)(?9V)(?10W)(?11X)(?12Y)(?13Z)(?14A)(?15B)(?16C)(?17D)(?18E)(?19F)(?20G)(?21H)(?22I)(?23J)(?24K)(?25L)(?26M)(?27n)(?28o)(?29p)(?30q)(?31r)(?32s)(?33t)(?34u)(?35v)(?36w)(?37x)(?38y)(?39z)(?40a)(?41b)(?42c)(?43d)(?44e)(?45f)(?46g)(?47h)(?48i)(?49j)(?50k)(?51l)(?52m)
Search mode:
☑ Regular expression
-
Hi @scott-sumner, @mathnerd314, and All,
Ah, yes ! I should have found the appropriate regex S/R ! Indeed, I was just afraid of that name ROT13 and I didn’t even try to search, on Wikipedia :-((( But, just looking your regex, I, immediately, understood that it was a a simple letter’s substitution cipher, with a 13 positions shift !
So, refer to that complete and interesting article on that topic :
https://en.wikipedia.org/wiki/ROT13
Quite funny, because after executing this regex a second time, you get back to your original text !
Generally speaking, this kind of regex should work for any letter substitution cipher, as long as the two letter’s sets defines a real bijection !
Cheers,
guy038
P.S. : I began to have a look to the Strip HTML tags table non tabs option. But, I did not understand, yet, the role of the possible space characters, added before and after the text parts !!
-
I like regular expressions and I use them quite frequently. It is amazing what one can do with them. BUT: they are not suitable for every problem and not for every user. Before one can use them to solve a problem he has to learn regular expressions - so he has got two problems. ;)
The above (partially monstrous) regular expressions are not a real alternative for a plugin which provides simply menu entries (which can be assigned to a keyboard shortcut) for some common and often needed tasks.
But OK, you had fun to code these regular expressions and it showed what is possible with this technique.
I use TextFX too and I think it is still a useful plugin. And because of its plenty of functions IMHO it is OK that it installs an own entry in the menu bar.
-
not suitable for every problem and not for every user
I think it plays to how much effort one wants to go to in order to accomplish one’s goals. Not everyone will invest what it takes to learn regular expressions (regex), and not everyone will invest in Pythonscript (or LuaScript)–two key things (regex search-and-replace / scripting) that are likely to power workarounds to some “missing” Notepad++ functionalities.
I have been known to lead off a post where I propose a Pythonscript solution to a problem with “If you’re willing to go to the effort to get the Pythonscript plugin going…”, because I realize that not everyone will.
BUT…it seems reasonable that if someone posts a question about how to do something, they probably really want to do it, and they may actually be willing to go above-and-beyond. And when someone HANDS you a solution (e.g. provides a regex search-and-replace setup, or says “here’s a script…”), is it really that much effort? :-)
I also think it has to do with “information exchange”, which is the lifeblood of this “Community”. One of the reasons I read and contribute to this Community is to (a) get insights on how others use Notepad++ because maybe knowing that improves my own usage, and (b) by sharing some of my own knowledge maybe I do the same for others (perhaps a dubious assumption!). Often I read a posting and my first thought is “not applicable to me”…but then I think more about it…and usually I end up learning something I didn’t know, or didn’t realize that I could put into use.
So an example of that is I didn’t know about rot13 until I saw this thread. I read up on it, and then forgot about it. It was only when @guy038 replied with regex solutions for the other functionalities that I thought “Hey, regex can do rot13, too”, and I provided that in the hope that it benefits someone else. Thus, idea exchange!
(partially monstrous) regular expressions are not a real alternative for a plugin which provides simply menu entries (which can be assigned to a keyboard shortcut) for some common and often needed tasks
Okay, so sure, it would be tough to remember the regexes for the rot13 thing. Nobody would do that. However, for someone that really needed it (and for some reason couldn’t use 32-bit Notepad++ where TextFX is available), maybe they would copy the text of those regexes to a file for future use via copy and paste to the Find dialog. That could work for them, but again, clunky…
…Or maybe they would think, how could I use this easier? If they couldn’t figure out a way, perhaps they’d post here and ask. And then a response might be: “Well, one could ‘script it’ and then it WOULD BE available via a menu item and COULD BE assigned a shortcut key combination…”…hmmm:
Rot13SubstitutionCypher.py
:# capture a letter into a group with the same name as the letter: f = '(?-i)' + '(?<A>A)|(?<B>B)|(?<C>C)|(?<D>D)|(?<E>E)|(?<F>F)|' + \ '(?<G>G)|(?<H>H)|(?<I>I)|(?<J>J)|(?<K>K)|(?<L>L)|' + \ '(?<M>M)|(?<N>N)|(?<O>O)|(?<P>P)|(?<Q>Q)|(?<R>R)|' + \ '(?<S>S)|(?<T>T)|(?<U>U)|(?<V>V)|(?<W>W)|(?<X>X)|' + \ '(?<Y>Y)|(?<Z>Z)|(?<a>a)|(?<b>b)|(?<c>c)|(?<d>d)|' + \ '(?<e>e)|(?<f>f)|(?<g>g)|(?<h>h)|(?<i>i)|(?<j>j)|' + \ '(?<k>k)|(?<l>l)|(?<m>m)|(?<n>n)|(?<o>o)|(?<p>p)|' + \ '(?<q>q)|(?<r>r)|(?<s>s)|(?<t>t)|(?<u>u)|(?<v>v)|' + \ '(?<w>w)|(?<x>x)|(?<y>y)|(?<z>z)' # self-documenting substitution!: A->N, B->O, C->P, etc r = '(?{A}N)(?{B}O)(?{C}P)(?{D}Q)(?{E}R)(?{F}S)(?{G}T)' + \ '(?{H}U)(?{I}V)(?{J}W)(?{K}X)(?{L}Y)(?{M}Z)(?{N}A)' + \ '(?{O}B)(?{P}C)(?{Q}D)(?{R}E)(?{S}F)(?{T}G)(?{U}H)' + \ '(?{V}I)(?{W}J)(?{X}K)(?{Y}L)(?{Z}M)(?{a}n)(?{b}o)' + \ '(?{c}p)(?{d}q)(?{e}r)(?{f}s)(?{g}t)(?{h}u)(?{i}v)' + \ '(?{j}w)(?{k}x)(?{l}y)(?{m}z)(?{n}a)(?{o}b)(?{p}c)' + \ '(?{q}d)(?{r}e)(?{s}f)(?{t}g)(?{u}h)(?{v}i)(?{w}j)' + \ '(?{x}k)(?{y}l)(?{z}m)' editor.rereplace(f, r)
Note that I changed the regexes from the original to use “named group conditional replacement” because I think it makes the transformation more self-documenting…see, I learned something as I thought about this some more! :-D
so he has got two problems
I get the reference!! Excellent usage of it!
:-D -
@Scott-Sumner
Of course I appreciate when somebody shares its knowledge and shows how to solve a problem even in an uncommon or “the hard” way.My intention was only to point out that it would be not a good idea to abandon the TextFX plugin only because it is possible to solve some of its tasks with regular expressions.
No one would throw away its screw drivers only because mostly it is possible to unscrew screws with a knife or a coin.
-
To hopefully put the rot13 thing to sleep as everyone is tired of it, a really smart person told me yesterday that the following might be the most succinct way to use Pythonscript to perform the rot13 conversion on a file:
editor.replaceSel(editor.getSelText().encode('rot13'))
Again worth pointing out: easy to run from a menu or assigned to a keyboard shortcut.
Forget the nasty regular expressions! :-D
-
hi Jari ,
you seem quite aware of this FX plugin - I m looking for such a plugin which would be able to :
find all the same strings, and choose to keep only one of each different string.example : birds are here … birds are here … kkskoreeoo dogs are barking
birds are here dogs are barkingand i would like to have after the plugin action :
birds are here … kkskoreeoo dogs are barkingIs that possible ?
thanks for your help–
stephane
-
Just replying since I wanted to put in my $0.02 on making a 64 bit version of TextFX - I don’t know how to do regex and can’t seem to wrap my head around any of the random regex statements I find. When searching google for “notepad++ find duplicates” the suggestion google comes up with involves TextFX as well as the first and second most valid link. Similar situation with “notepad++ sort text”. There are 5300 results for “notepad++ find duplicates textfx” and 38,400 results for “notepad++ textfx”. Like it or not, TextFX was very well documented and I feel it should be maintained. I’m stuck with running both versions as the 64 bit handles large files better, but I can’t find a way to remove only duplicates - I can find replace duplicates, but it removes both the original value as well as the duplicate - I want to keep all distinct instances which I have not found any way to do in the 64 bit version (but TextFX does extremely well).
The native sort feature works extremely well too, but unless a native “find duplicates” option gets put into the main code, I really think TextFX is still a valid/needed plugin.
-
The tips posted so far have been helpful how to not use TextFX. I have not found how to “line up text in columns separated by clipboard character” yet - does anyone know how to do that one without TextFX?
-
At first I thought the replies along the lines of “Just use regexes” were some sort of joke. It’s a little sad that guy038 actually typed all that out thinking he was being helpful. If you cannot perceive a difference in ease of use between pressing a button and using one’s acquired technical knowledge to write code that replicates that same button’s inner design, I don’t even know what to tell you - except that you might want to look up the phrase “opportunity cost”.
The concept of just scrapping a plugin because all users “ought to” learn search parsing language to do it themselves is a duck-brained monkey-man idea, evincing an unfamiliarity with very simple challenges during collaboration between humans. Imagine I want to have a temp or a teenager reformat things, and I want it to take one minute instead of sixty.
Imagine I’m a new user wanting to test drive Notepad++ with a specific use case, only to find that as a matter of fact that was removed and will never be re-implemented. Now years and years of documentation and reasons for people to start using Notepad++ are transformed into actively harmful and obfuscating misinformation claiming that the program has features it doesn’t and never will.
Imagine I want to “line up text in columns separated by clipboard character”, but I don’t know the regular expression for it. Whoops - Looks like instead of the one second it would have taken, the guy looking for the answer to that question has been waiting for 18 hours, as of this post. In that time he might’ve done it by hand. It’s piteous that people on the official notepad++ community would be seriously suggesting he do it by hand, then.
-
@Bodacious-Crumb I’m unclear on what you mean by “scrapping a plugin.” Do you mean by the user or its developer? The TextFX plugin is 3rd party and was abandoned by its developer many, many years ago. Unless that fact changes or it’s forked by someone else later on, one may have to accept that it may never work on notepad++ 64-bit and it could potentially stop working in future versions of notepad++ 32-bit. So if you mean “by the developer” then understand that these are open source projects and no one’s obligated to either provide these plugins or to maintain them. And if you mean by the “end-user”, well, if this abandoned plugin stops working entirely, then the solutions provided here today may be the only stop-gap ones available until something better comes along.
Furthermore, guy038’s regexs are useful for people who A). must use notepad++ 64bit for some reason, or B). want to easily implement these features and want a potentially simple copy paste solution. In either case, his posts are helpful to some people, but not necessarily everyone.
-
@Bodacious-Crumb, I agree with you. People who don’t understand this, may use vim+grep+regex. Why do you need Notepad?
And Notepad++ without TextFX is useless for me. MS Word is a good deal. -
After reading through this thread - I wonder if there could one day be a configurable plugin/menu where we can add/save these kind of regex functions for use like in the TextFX plugin which was convenient not having to store/remember all these regexes…
-
as a workaround you could make macros out of it and store it with a meaningful name.
Cheers
Claudia