Base64 Encode: Generate mail-ready blobs by breaking lines at 76 characters

DRSpalding

I may be missing a way to do this automatically in Notepad++ and/or the MIME plugin, but that said…

I do a lot of spam reporting and as such, I have to decode base64 encoded MIME blobs in emails to pull out URLs and to mask email addresses. Encoding them back into base64 means I have one long line of base64 text that I end up having to manually (or use a macro) split it into 76 character chunks like the quoted-printable blocks sans the ‘=’ at the end of the lines. Could you add in an option to do that automatically when encoding? It would streamline my use of it and lessen the amount of time it takes to report such spams.

PeterJones

@DRSpalding said:

Could you add in an option to do that automatically

Welcome to the Notepad++ Community. Unfortunately, we are fellow users, and not the developer. The MIME Tools plugin is maintained by the Notepad++ developer, and while he occasionally posts on the forums, he prefers bug fixes and feature requests to be made through the issue tracker (as he doesn’t notice all posts here, and this Forum is not good for tracking such things). For the MIME Tools plugin, the issue tracker is at https://github.com/npp-plugins/mimetools/issues , so you could post your request there (and then reply here with a link to the issue you create, so that others who find this discussion will be able to easily check the status of the issue in the tracker).

As a workaround until it gets fixed, a single regular expression search-and-replace will do it

Select All, then Plugins > MIME Tools > Base64 Encode
Search > Replace (or Ctrl+H), then
- Find What: .{76} – find each group of 76 characters
- Replace With: $0=\r\n - replace with those 76 characters, followed by an equals sign =, followed by a windows CRLF newline sequence
- mode = regular expression

I guess you also said uyou had a macro, which is another workaround… but this regex may or may not be faster for you to use.

DRSpalding

Thank you for the RE help. It’s been years since I had to attempt to grok REs at that level. :) That is superior to the macro I was using because I didn’t want to put a lot of work into it if there was an option somewhere to do it for me. Selecting it and doing that RE should probably do the trick in one pull instead of a macro invocation for each line.

I put it in as a feature request on the github site for issues (so I thought anyway) but it was punted and closed nearly immediately and I was asked to come to the community forum instead.

Meta Chuh

@DRSpalding

i apologise for that.
we currently have to keep the core notepad++ tracker clean of requests or issues that do not concern the base notepad++ source code, and your issue was not filed at the mimetools issue tracker as suggested by @PeterJones .

even if mimetools is a closely related plugin, it’s another team that handles this.

here at the community you also have the advantage to reach all internal and external devs and core members for discussion.
from experience, this often results in a much more sophisticated and cross thinking solution, with most of such being an immediate ad-hoc, which can be implemented by all users, without the need of a base code change.

you will also be able to discuss with us in length, which would not be possible at the issue tracker.
an optimised issue report or feature request, with a higher chance of being accepted is, if we work them out here, and define all wordings with our whole community, before filing it.

thanks again for your help and best regards.

guy038

Hello, @DRSpalding, @peterjones, @meta-chuh and All,

I’ve had a look to the Base-64 encoding/decoding process from this Wikipedia article, below :

https://en.wikipedia.org/wiki/Base64

Seemingly, during the Base 64 encoding process, each range of 3 consecutive characters, of the original text, is encoded into a 4 characters string, according to the Base-64 table, below :

    •-------•------•-------•------•-------•------•-------•------•
    | Index | Char | Index | Char | Index | Char | Index | Char |
    •-------•------•-------•------•-------•------•-------•------•
    |   0   |   A  |   16  |  Q   |   32  |   g  |   48  |   w  |
    |   1   |   B  |   17  |  R   |   33  |   h  |   49  |   x  |
    |   2   |   C  |   18  |  S   |   34  |   i  |   50  |   y  |
    |   3   |   D  |   19  |  T   |   35  |   j  |   51  |   z  |
    |   4   |   E  |   20  |  U   |   36  |   k  |   52  |   0  |
    |   5   |   F  |   21  |  V   |   37  |   l  |   53  |   1  |
    |   6   |   G  |   22  |  W   |   38  |   m  |   54  |   2  |
    |   7   |   H  |   23  |  X   |   39  |   n  |   55  |   3  |
    |   8   |   I  |   24  |  Y   |   40  |   o  |   56  |   4  |
    |   9   |   J  |   25  |  Z   |   41  |   p  |   57  |   5  |
    |  10   |   K  |   26  |  a   |   42  |   q  |   58  |   6  |
    |  11   |   L  |   27  |  b   |   43  |   r  |   59  |   7  |
    |  12   |   M  |   28  |  c   |   44  |   s  |   60  |   8  |
    |  13   |   N  |   29  |  d   |   45  |   t  |   61  |   9  |
    |  14   |   O  |   30  |  e   |   46  |   u  |   62  |   +  |
    |  15   |   P  |   31  |  f   |   47  |   v  |   63  |   /  |
    •-------•------•-------•------•-------•------•-------•------•

Now, if your original text is not a multiple of 3 chars, it may contain, either :

2 last additional characters, which are Base-64 encoded as a 3 chars string, followed with one = sign
1 last additional character, which is Base-64 encoded as a 2 chars string, followed with two = signs

So, in all cases , the encoded Base-64 text is a string whose length is a multiple of 4 !

Note that the Base-64 encoding process take account of any char, included spaces and any kind of line-break !

So, if we consider the original ASCII quote, given in the Wikipedia article, you must delete the two line-breaks, in order to get the one line text, below :

Man is distinguished, not only by his reason, but by this singular passion from other animals, which is a lust of the mind, that by a perseverance of delight in the continued and indefatigable generation of knowledge, exceeds the short vehemence of any carnal pleasure.

And after selecting all text, without the last line-break, and using the Plugins > MINE tools > Base 64 Encode option, we do get the same encoded text, than in the Wikipedia article :

TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlzIHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2YgdGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGludWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRoZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4=

If we have kept the line-breaks, of the original text , the encoded Base-64 string would have been quite different !

Now, regarding the Base-64 decoding process, it is said in Wikipedia :

(newlines and whitespaces may be present anywhere but are to be ignored on decoding)

I personally verified that, even if a Base-64 encoded text is split, at any location, with line-break(s), it does not matter and the decoding process give your original text back, as expected ;-))

For instance, these 4 Base-64 encoded examples, of 8 characters long :

TWFuIGlz
~~~~~~~~~~
TW
FuIGl
z
~~~~~~~~~~
T
WFu
IGlz
~~~~~~~~~~
TWFuIG
lz

give, for all, after running the Plugins > MINE tools > Base 64 Decode option, the text “Man is” with one space between !

However, if you add some space characters in a legal Base-64 sequence, in order to get a new total, being a multiple of 4, it does NOT give the same original text and, most of the time, forces Notepad++ to crash :-((

Finally, Peter, I don’t think that an equal sign have to be added at the end of each block of 76 characters ! Indeed, the one or two = signs is added, during the Base-64 encoding process, in order to get a legal sequence, which should be a multiple of 4 ;-))

So, assuming the Base-64 encoded one-line string below :

TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlzIHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2YgdGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGludWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRoZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4=

You may split it in blocks of any length ( N ) , with the regex S/R :

SEARCH (?-s).{N}

REPLACE $0\r\n ( or $0\n for Unix/Linux files )

So, with N = 76 , it gives :

TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlz
IHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2Yg
dGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGlu
dWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRo
ZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4=

With N = 37 , you would obtain :

TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvb
mx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aG
lzIHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGh
lciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qg
b2YgdGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZ
XZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY2
9udGludWVkIGFuZCBpbmRlZmF0aWdhYmxlIGd
lbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNl
ZWRzIHRoZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgY
W55IGNhcm5hbCBwbGVhc3VyZS4=

And with N = 160 , we’ll get the Base-64 encoded text :

TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlzIHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2YgdGhlIG1p
bmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGludWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRoZSBzaG9ydCB2ZWhl
bWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4=

Now, if you decode these 3 examples above, without selecting the last line-break, with the option Plugins > MINE tools > Base 64 Decode, they all end up with the same original text, below :

Man is distinguished, not only by his reason, but by this singular passion from other animals, which is a lust of the mind, that by a perseverance of delight in the continued and indefatigable generation of knowledge, exceeds the short vehemence of any carnal pleasure.

Best Regards,

guy038