I need a function/plugin to extract only unnecessary text from lines
-
I need a function/plugin to extract only unnecessary text from lines
Notepad++ v8.7.1 (64-bit)
Build time : Oct 31 2024 - 00:48:56
Path : C:\Program Files\Notepad++\notepad++.exe
Command Line : “C:\Users\RagnarLodbrok\Desktop\chomik hity.txt”
Admin mode : ON
Local Conf mode : OFF
Cloud Config : OFF
Periodic Backup : ON
OS Name : Windows 10 Enterprise LTSC 2021 (64-bit)
OS Version : 21H2
OS Build : 19044.5608
Current ANSI codepage : 1250
Plugins :
mimeTools (3.1)
NppConverter (4.6)
NppExport (0.4)chciałbym aby taką operację mógł wykonać, mam taki plik wiersze w pliku
zzzzzzzzzzzz0:Azzzzzzzz00 | Azzzęzzz zzzzzzzz = 00 AA | Azzzzz = 0000 zzzzz000:Azzzz000 | Azzzęzzz zzzzzzzz = 000,0 AA | Azzzzz = 0000 zzzzzz:zzzzzzz0 | Azzzęzzz zzzzzzzz = 000,00 AA | Azzzzz = 0000 zzzzzzzz:zz0z0z0 | Azzzęzzz zzzzzzzz = 00 AA | Azzzzz = 0000 zzzzzzz0:zzzzzz000 | Azzzęzzz zzzzzzzz = 00 AA | Azzzzz = 0000
and I would like to receive such a file with the words removed
this is what it would look like:zzzzzzzzzzzz0:Azzzzzzzz00 zzzzz000:Azzzz000 zzzzzz:zzzzzzz0 zzzzzzzz:zz0z0z0
Manually deleting such texts takes days, so my question is: is there a plugin, command, or other solution? Sorry for the inquiries, but I’m completely unfamiliar with such processes. Best regards.
—
moderator added code markdown around text; please don’t forget to use the
</>
button to mark example text as “code” so that characters don’t get changed by the forum
moderator edit 2: changed most of the lowercase toz
, uppercase toA
, and numbers to0
, to avoid publishing secret possible username/passwords -
My suggestion: Use the Search > Replace menu action (or
Ctrl+H
as the default shortcut):- FIND WHAT =
^(\w+:\w+)?.*$(\R)?
- REPLACE WITH =
?1$1$2
- SEARCH MODE =
Regular Expression
then use REPLACE ALL button
The FIND WHAT puts any
xyz123:abc987
at the beginning of a line into memory group #1, skips over the rest of the line, and puts the newline sequence in group#2.
The REPLACE with will replace that entire line with just the contents of group#1 followed by group#2 (so thexyz123:abc987
and newline) IF thexyz123:abc987
exists; if it DOESN’T exist, it will replace the entire line with nothing (hence deleting the line)I believe this does what you want.
----
Useful References
- FIND WHAT =
-
zzzzzz.zzzzzz@zz.zz:Az000000 | zzzzzzzzzzz.zz = Azzzzzzóz | 00 = Azzzz Azz/Azz Az zzzzzzzzzzzz0:Azzzzzzzz00 | Azzzęzzz zzzzzzzz = 00 AA | Azzzzz = 0000 zzzzz000:Azzzz000 | Azzzęzzz zzzzzzzz = 000,0 AA | Azzzzz = 0000 zzzzzz:zzzzzzz0 | Azzzęzzz zzzzzzzz = 000,00 AA | Azzzzz = 0000 zzzzzzzz:zz0z0z0 | Azzzęzzz zzzzzzzz = 00 AA | Azzzzz = 0000 zzzzzzz0:zzzzzz000 | Azzzęzzz zzzzzzzz = 00 AA | Azzzzz = 0000 zzzzzzzzz.0000:Azzzzzzzzz00 | Azzzęzzz zzzzzzzz = 00 AA | Azzzzz = 0000 00:00][AAAAAAA] zzzzzzzz0:zzzz000 - [Azzz Azzzzz = 0,00 AA | AAAAAA = 0000] - @AAAAAA 00:00][AAAAAAA] zzzzzzz_zzzzzz:zzzzzz00zzzzzz - [Azzz Azzzzz = 0,00 AA | AAAAAA = 0000] - @AAAAAA 00:00][AAAAAAA] zzzzzzz:zzzzzz0 - [Azzz Azzzzz = 00,00 AA | AAAAAA = 0000] - @AAAAAA 00:00][AAAAAAA] zzzzzzzzz.00:z0zzzzA00!zzzzzzzz - [Azzz Azzzzz = 00,00 AA | AAAAAA = 0000] - @AAAAAA zzzz0000:zzzz00 | zzzzzzzzzz = 00.00.0000 00:00:00 zzzzzzzzzzzz00:zzzzz0000 | zzzzzzzzzz = 00.00.0000 00:00:00 zzzzz00:zzzzz000 | zzzzzzzzzz = 00.00.0000 00:00:00 zzzzz00:zzzzzzz | zzzzzzzzzz = 00.00.0000 00:00:00 zzzzzzzzz:zzzzzzzzzz0 | zzzzzzzzzz = 00.00.0000 00:00:00 Azzzz:Azzzzzzzzz00 | zzzzzzzzzz = 00.00.0000 00:00:00 zzzzzzzzzzzzzz000:zzzzzz00 | zzzzzzzzzz = 00.00.0000 00:00:00 zzzz:zzzz0000 | zzzzzzzzzz = 00.00.0000 00:00:00 Azzzzz:000000 | zzzzzzzzzz = 00.00.0000 00:00:00 00zzzzzzzz:%AzzAAAz?AAz0zz | 000000000@zzzz:A0000A | 000000Az:0000000Azz!z | zzzzzzz:zzzzzzz0 | zzz = [zzzzzz zzz zzzzzz] Azzz00000000:Azzz000 | zzzz0000:0zzz0AAzzz0A_00z | AzzzAzzzz00:Azzzz0000 | Azzz_00:Azzzz000 | zzzzzz:Azzzzzz0 | [zzzzzz zzz zzzzzz] zzzzzz:AzAzAz_000 | zzzzzzzz:Azzzzzz0000 | [zzzzzz zzz zzzzzz] zzzzzzz00@zz.zz:Azzzz0000 | Azzzzzzzz zzzzzzzz = 0 | Azzzzz = 0 | Azzzzzzzzz = 0
a gdybym miał taki tekst to nie da sie tego zrobić
—
moderator added code markdown around text; please don’t forget to use the</>
button to mark example text as “code” so that characters don’t get changed by the forum
moderator edit 2: changed most of the lowercase toz
, uppercase toA
, and numbers to0
, to avoid publishing secret possible username/passwords -
Don’t forget to hit the
</>
on the toolbar, and paste your data where the forum post editor sayscode_text
.Second, these better not be real user passwords, or any private data, you are sharing with the whole internet. This is a public forum, and anyone can read them.
I really don’t like helping people with password-file search/replace, because it’s too high of a risk that I’m helping someone who is harvesting email/password pairs.
Since I already started, I will answer this one last question… though I will use moderator powers to change every password field that you’ve shown above, just to make sure.
–
But if I had text like this, it wouldn’t be possible.
The regex I gave assumed to the left and right of the
:
would be “word characters”, which means letters, numbers, and underscore. Since your original example only included those, that’s all I thought I needed to match. I’ll be less restrictive this time, and assume the rules are as follows:- junk at the beginning of a line, ending with a
]
followed by one or more whitespace characters, should be removed, even if there’s a colon- there isn’t an example of it, but I’ll also no longer assume that the colon-pair are the first “word” on a given line, and will throw that away as junk, as well
- there will be a start-of-line or space before the colon-separated pair.
- I will assume the colon-separated pairs cannot contain whitespace (space, tab, newline), but any other character is fair game
FIND =
^(?:.*\x5D\s|[^:]+\s)?(\S+:\S+).*$(\R)?
REPLACE =?1$1$2
SEARCH MODE =Regular Expression
That will turn
zzzzzzzzzzzz0:Azzzzzzzz00 | Azzzęzzz zzzzzzzz = 00 AA | Azzzzz = 0000 zzzzz000:Azzzz000 | Azzzęzzz zzzzzzzz = 000,0 AA | Azzzzz = 0000 zzzzzz:zzzzzzz0 | Azzzęzzz zzzzzzzz = 000,00 AA | Azzzzz = 0000 zzzzzzzz:zz0z0z0 | Azzzęzzz zzzzzzzz = 00 AA | Azzzzz = 0000 zzzzzzz0:zzzzzz000 | Azzzęzzz zzzzzzzz = 00 AA | Azzzzz = 0000 zzzzzz.zzzzzz@zz.zz:Az000000 | zzzzzzzzzzz.zz = Azzzzzzóz | 00 = Azzzz Azz/Azz Az zzzzzzzzzzzz0:Azzzzzzzz00 | Azzzęzzz zzzzzzzz = 00 AA | Azzzzz = 0000 zzzzz000:Azzzz000 | Azzzęzzz zzzzzzzz = 000,0 AA | Azzzzz = 0000 zzzzzz:zzzzzzz0 | Azzzęzzz zzzzzzzz = 000,00 AA | Azzzzz = 0000 zzzzzzzz:zz0z0z0 | Azzzęzzz zzzzzzzz = 00 AA | Azzzzz = 0000 zzzzzzz0:zzzzzz000 | Azzzęzzz zzzzzzzz = 00 AA | Azzzzz = 0000 zzzzzzzzz.0000:Azzzzzzzzz00 | Azzzęzzz zzzzzzzz = 00 AA | Azzzzz = 0000 00:00][AAAAAAA] zzzzzzzz0:zzzz000 - [Azzz Azzzzz = 0,00 AA | AAAAAA = 0000] - @AAAAAA 00:00][AAAAAAA] zzzzzzz_zzzzzz:zzzzzz00zzzzzz - [Azzz Azzzzz = 0,00 AA | AAAAAA = 0000] - @AAAAAA 00:00][AAAAAAA] zzzzzzz:zzzzzz0 - [Azzz Azzzzz = 00,00 AA | AAAAAA = 0000] - @AAAAAA 00:00][AAAAAAA] zzzzzzzzz.00:z0zzzzA00!zzzzzzzz - [Azzz Azzzzz = 00,00 AA | AAAAAA = 0000] - @AAAAAA zzzz0000:zzzz00 | zzzzzzzzzz = 00.00.0000 00:00:00 zzzzzzzzzzzz00:zzzzz0000 | zzzzzzzzzz = 00.00.0000 00:00:00 zzzzz00:zzzzz000 | zzzzzzzzzz = 00.00.0000 00:00:00 zzzzz00:zzzzzzz | zzzzzzzzzz = 00.00.0000 00:00:00 zzzzzzzzz:zzzzzzzzzz0 | zzzzzzzzzz = 00.00.0000 00:00:00 Azzzz:Azzzzzzzzz00 | zzzzzzzzzz = 00.00.0000 00:00:00 zzzzzzzzzzzzzz000:zzzzzz00 | zzzzzzzzzz = 00.00.0000 00:00:00 zzzz:zzzz0000 | zzzzzzzzzz = 00.00.0000 00:00:00 Azzzzz:000000 | zzzzzzzzzz = 00.00.0000 00:00:00 00zzzzzzzz:%AzzAAAz?AAz0zz | 000000000@zzzz:A0000A | 000000Az:0000000Azz!z zzzzzzz:zzzzzzz0 | zzz = [zzzzzz zzz zzzzzz] Azzz00000000:Azzz000 | zzzz0000:0zzz0AAzzz0A_00z | AzzzAzzzz00:Azzzz0000 | Azzz_00:Azzzz000 | zzzzzz:Azzzzzz0 | [zzzzzz zzz zzzzzz] zzzzzz:AzAzAz_000 | zzzzzzzz:Azzzzzz0000 | [zzzzzz zzz zzzzzz] zzzzzzz00@zz.zz:Azzzz0000 | Azzzzzzzz zzzzzzzz = 0 | Azzzzz = 0 | Azzzzzzzzz = 0 zz zzzzz zzzz zzzzz:zzz0z
into
zzzzzzzzzzzz0:Azzzzzzzz00 zzzzz000:Azzzz000 zzzzzz:zzzzzzz0 zzzzzzzz:zz0z0z0 zzzzzzz0:zzzzzz000 zzzzzz.zzzzzz@zz.zz:Az000000 zzzzzzzzzzzz0:Azzzzzzzz00 zzzzz000:Azzzz000 zzzzzz:zzzzzzz0 zzzzzzzz:zz0z0z0 zzzzzzz0:zzzzzz000 zzzzzzzzz.0000:Azzzzzzzzz00 zzzzzzzz0:zzzz000 zzzzzzz_zzzzzz:zzzzzz00zzzzzz zzzzzzz:zzzzzz0 zzzzzzzzz.00:z0zzzzA00!zzzzzzzz zzzz0000:zzzz00 zzzzzzzzzzzz00:zzzzz0000 zzzzz00:zzzzz000 zzzzz00:zzzzzzz zzzzzzzzz:zzzzzzzzzz0 Azzzz:Azzzzzzzzz00 zzzzzzzzzzzzzz000:zzzzzz00 zzzz:zzzz0000 Azzzzz:000000 00zzzzzzzz:%AzzAAAz?AAz0zz 000000000@zzzz:A0000A 000000Az:0000000Azz!z zzzzzzz:zzzzzzz0 Azzz00000000:Azzz000 zzzz0000:0zzz0AAzzz0A_00z AzzzAzzzz00:Azzzz0000 Azzz_00:Azzzz000 zzzzzz:Azzzzzz0 zzzzzz:AzAzAz_000 zzzzzzzz:Azzzzzz0000 zzzzzzz00@zz.zz:Azzzz0000 zzzzz:zzz0z
Which again is what I think you want. But this is the last help I will give in this quest. Each of the pieces used in the regular expressions I showed are described in the user manual in the Regular Expressions syntax section. If you need more changes, you will have to start trying to figure it out on your own.
- junk at the beginning of a line, ending with a
-
Instead of matching from the start of the line, it might be easier to match from the end of the line and remove the whole match.
FIND =
\h*\|.*$
REPLACE = EMPTY
SEARCH MODE =Regular Expression
The
$
anchors to the end of the line. The greediness of.*
should match back to the first|
and the\h*
will match back any horizontal whitespace to leave just the first segment of characters wanted.Ensure
. does match newline
with the checkbox is unchecked. -
@mpheath ,
Instead of matching from the start of the line, it might be easier to match from the end of the line and remove the whole match.
Probably a good idea (and simpler than mine) for most of the lines, but it wouldn’t work for some of the new data:
00:00][AAAAAAA] zzzzzzzz0:zzzz000 - [Azzz Azzzzz = 0,00 AA | AAAAAA = 0000] - @AAAAAA 00:00][AAAAAAA] zzzzzzz_zzzzzz:zzzzzz00zzzzzz - [Azzz Azzzzz = 0,00 AA | AAAAAA = 0000] - @AAAAAA 00:00][AAAAAAA] zzzzzzz:zzzzzz0 - [Azzz Azzzzz = 00,00 AA | AAAAAA = 0000] - @AAAAAA 00:00][AAAAAAA] zzzzzzzzz.00:z0zzzzA00!zzzzzzzz - [Azzz Azzzzz = 00,00 AA | AAAAAA = 0000] - @AAAAAA
-
That will make it quite more complex.
FIND =
(?(?=^\d\d:\d\d).*\R|\h*\|.*$)
REPLACE = EMPTY
SEARCH MODE =Regular Expression
In comparison to
\h*\|.*$
, this pattern removes the 4 lines mentioned with a conditional(?(condition)yes|no)
so yes to match whole line if00:00
like digits else use\h*\|.*$
. -
@mpheath said in I need a function/plugin to extract only unnecessary text from lines:
removes the 4 lines mentioned
Why remove? The OP said (translated): “But if I had text like this, it wouldn’t be possible.” – I interpreted that to mean that all the data in the example should be stripped down to the xzy:xyz.
So instead of deleteing the lines like that, my solution edits them down to
zzzzzzzz0:zzzz000 zzzzzzz_zzzzzz:zzzzzz00zzzzzz zzzzzzz:zzzzzz0 zzzzzzzzz.00:z0zzzzA00!zzzzzzzz
that is, it strips the stuff before and after the pairs, but keeps the pairs.
-
@PeterJones Your correct. Seems the colon is important and not the pipe. I am not sure with possible variations what may pass or fail to achieve the desired result. The 1st post has a result with 1 less line so I have doubt what is needed.
-
Hello, @ragnar-lodbrok, @peterjones, @mpheath and All
Using the INPUT text of @peterjones, I also searched for a single regex, without success. I’ve just found out two successive searches/replacements which produce the same OUTPUT as the @peterjones’s one ! These two regexes simply delete everything which is not wanted.
So, starting with :
zzzzzzzzzzzz0:Azzzzzzzz00 | Azzzęzzz zzzzzzzz = 00 AA | Azzzzz = 0000 zzzzz000:Azzzz000 | Azzzęzzz zzzzzzzz = 000,0 AA | Azzzzz = 0000 zzzzzz:zzzzzzz0 | Azzzęzzz zzzzzzzz = 000,00 AA | Azzzzz = 0000 zzzzzzzz:zz0z0z0 | Azzzęzzz zzzzzzzz = 00 AA | Azzzzz = 0000 zzzzzzz0:zzzzzz000 | Azzzęzzz zzzzzzzz = 00 AA | Azzzzz = 0000 zzzzzz.zzzzzz@zz.zz:Az000000 | zzzzzzzzzzz.zz = Azzzzzzóz | 00 = Azzzz Azz/Azz Az zzzzzzzzzzzz0:Azzzzzzzz00 | Azzzęzzz zzzzzzzz = 00 AA | Azzzzz = 0000 zzzzz000:Azzzz000 | Azzzęzzz zzzzzzzz = 000,0 AA | Azzzzz = 0000 zzzzzz:zzzzzzz0 | Azzzęzzz zzzzzzzz = 000,00 AA | Azzzzz = 0000 zzzzzzzz:zz0z0z0 | Azzzęzzz zzzzzzzz = 00 AA | Azzzzz = 0000 zzzzzzz0:zzzzzz000 | Azzzęzzz zzzzzzzz = 00 AA | Azzzzz = 0000 zzzzzzzzz.0000:Azzzzzzzzz00 | Azzzęzzz zzzzzzzz = 00 AA | Azzzzz = 0000 00:00][AAAAAAA] zzzzzzzz0:zzzz000 - [Azzz Azzzzz = 0,00 AA | AAAAAA = 0000] - @AAAAAA 00:00][AAAAAAA] zzzzzzz_zzzzzz:zzzzzz00zzzzzz - [Azzz Azzzzz = 0,00 AA | AAAAAA = 0000] - @AAAAAA 00:00][AAAAAAA] zzzzzzz:zzzzzz0 - [Azzz Azzzzz = 00,00 AA | AAAAAA = 0000] - @AAAAAA 00:00][AAAAAAA] zzzzzzzzz.00:z0zzzzA00!zzzzzzzz - [Azzz Azzzzz = 00,00 AA | AAAAAA = 0000] - @AAAAAA zzzz0000:zzzz00 | zzzzzzzzzz = 00.00.0000 00:00:00 zzzzzzzzzzzz00:zzzzz0000 | zzzzzzzzzz = 00.00.0000 00:00:00 zzzzz00:zzzzz000 | zzzzzzzzzz = 00.00.0000 00:00:00 zzzzz00:zzzzzzz | zzzzzzzzzz = 00.00.0000 00:00:00 zzzzzzzzz:zzzzzzzzzz0 | zzzzzzzzzz = 00.00.0000 00:00:00 Azzzz:Azzzzzzzzz00 | zzzzzzzzzz = 00.00.0000 00:00:00 zzzzzzzzzzzzzz000:zzzzzz00 | zzzzzzzzzz = 00.00.0000 00:00:00 zzzz:zzzz0000 | zzzzzzzzzz = 00.00.0000 00:00:00 Azzzzz:000000 | zzzzzzzzzz = 00.00.0000 00:00:00 00zzzzzzzz:%AzzAAAz?AAz0zz | 000000000@zzzz:A0000A | 000000Az:0000000Azz!z zzzzzzz:zzzzzzz0 | zzz = [zzzzzz zzz zzzzzz] Azzz00000000:Azzz000 | zzzz0000:0zzz0AAzzz0A_00z | AzzzAzzzz00:Azzzz0000 | Azzz_00:Azzzz000 | zzzzzz:Azzzzzz0 | [zzzzzz zzz zzzzzz] zzzzzz:AzAzAz_000 | zzzzzzzz:Azzzzzz0000 | [zzzzzz zzz zzzzzz] zzzzzzz00@zz.zz:Azzzz0000 | Azzzzzzzz zzzzzzzz = 0 | Azzzzz = 0 | Azzzzzzzzz = 0 zz zzzzz zzzz zzzzz:zzz0z
First search/replacement :
-
FIND
(?-s)^(\S+:\S+\x20|[^:\r\n]+\x20)?\S+:\S+(*SKIP)(*F)|.+
-
REPLACE
Leave EMPTY
Second search/replacement :
-
FIND
'(?-s)^.+?\x20(?=\S+:)
-
REPLACE
Leave EMPTY
Which gives the following OUTPUT result :
zzzzzzzzzzzz0:Azzzzzzzz00 zzzzz000:Azzzz000 zzzzzz:zzzzzzz0 zzzzzzzz:zz0z0z0 zzzzzzz0:zzzzzz000 zzzzzz.zzzzzz@zz.zz:Az000000 zzzzzzzzzzzz0:Azzzzzzzz00 zzzzz000:Azzzz000 zzzzzz:zzzzzzz0 zzzzzzzz:zz0z0z0 zzzzzzz0:zzzzzz000 zzzzzzzzz.0000:Azzzzzzzzz00 zzzzzzzz0:zzzz000 zzzzzzz_zzzzzz:zzzzzz00zzzzzz zzzzzzz:zzzzzz0 zzzzzzzzz.00:z0zzzzA00!zzzzzzzz zzzz0000:zzzz00 zzzzzzzzzzzz00:zzzzz0000 zzzzz00:zzzzz000 zzzzz00:zzzzzzz zzzzzzzzz:zzzzzzzzzz0 Azzzz:Azzzzzzzzz00 zzzzzzzzzzzzzz000:zzzzzz00 zzzz:zzzz0000 Azzzzz:000000 00zzzzzzzz:%AzzAAAz?AAz0zz 000000000@zzzz:A0000A 000000Az:0000000Azz!z zzzzzzz:zzzzzzz0 Azzz00000000:Azzz000 zzzz0000:0zzz0AAzzz0A_00z AzzzAzzzz00:Azzzz0000 Azzz_00:Azzzz000 zzzzzz:Azzzzzz0 zzzzzz:AzAzAz_000 zzzzzzzz:Azzzzzz0000 zzzzzzz00@zz.zz:Azzzz0000 zzzzz:zzz0z
Best Regards,
guy038
-