How to mark lines with under "x" characters after : in a line.

Alan Kilborn

If you want to stay on the same line, lead off the search expression with (?-s). This tells the searcher to not allow a . used later to match a line ending character(s). Thus the .* part won’t spillover match onto multiple lines.

Francisco

@Alan Kilborn, thanks, good morning everyone, I was successful using ADDRESS (? - s), selecting only the lines that start with ADDRESS. To exclude them, mark all and exclude the marked lines.
Is it possible to perform this operation on all open files? I can only find, I can not mark them

Alan Kilborn

@Francisco

You are correct; you cannot bookmark more than one file per marking operation.

It isn’t clear to me what your real goal is exactly but it appears to be a deletion operation. I think it is likely that this can be done totally with a regular expression replacement and not a combo of regex marking followed by boomarked lines manipulation.

Francisco

@Alan-Kilborn thanks…
What I need:
I have 100 text files, with the same format, each with several lines.
6 of these lines, are present in all files and start like this:
ADDRESS:
ADDRESS-CITY: Christmas
ADDRESS-STATE-PROVINCE: RN
ADDRESS-POSTALCODE: 59054550
ADDRESS-COUNTRY: BRAZIL
EMAIL: mjnhx@globo.com
I need to easily delete the 6 lines above.

PeterJones

@Francisco said:

I have 100 text files, with the same format, each with several lines.
6 of these lines, are present in all files and start like this:

The problem is, you’ve already rejected our solutions (or, at least, you keep on asking, so we have to assume your problem isn’t solved), but have shown nothing that indicates why what we’ve given doesn’t work for you. One reason for this is explained in my boilerplate below (after the dashed line).

That said, maybe you’re just unsure how to combine @Alan-Kilborn’s fix to my regex, and then have it actually do the deletion, rather than just highlighting. If that’s the case, then it’s simple. I’ll also tweak my portion, because you have now indicated that it should also delete EMAIL, which wasn’t anywhere in your original problem statement.

Find What: (?-s)^(?:ADDRESS(-.*?)*|EMAIL):.*?(?:\R|\Z)
- (?-s): don’t have . match newline
- ^: match starts at beginning of line
- (?:...): make a group, but don’t give it a number
- ADDRESS(-.*?)*: match the word “ADDRESS”, possibly followed by one or more hyphens, possibly followed by other characters
- |: the OR operator – will match what is before or what is after
- EMAIL: the word EMAIL
- :: that group of ADDRESS or EMAIL must be immediately followed by a colon to match
- .*?: match the remaining characters on the line
- (?:\R|\Z): another unnumbered group, this time containing a NEWLINE sequence (\R = CR, LF, or CRLF) or end-of-file (\Z).
Replace With: empty
- this will delete the whole line matched above, including the newline
Mode = regular expression

I recommend getting the expression working with one file; once that works, then you can move on to using the Find in Files for all your files.

With those settings, this block of text:

ADDRESS:
ADDRESS-CITY: Christmas
ADDRESS-STATE-PROVINCE: RN
ADDRESS-POSTALCODE: 59054550
ADDRESS-COUNTRY: BRAZIL
EMAIL: mjnhx@globo.com
You tell us nothing about the remainder of the file, so I don't know whether
the following lines match your pattern, or whether they don't:
SOMETHING-ELSE: value
MORE-COLONED-LINES: here
For now, I'll assume you want to keep everything except lines that 
start with "ADDRESS...:" or "EMAIL:"

would be edited to:

You tell us nothing about the remainder of the file, so I don't know whether
the following lines match your pattern, or whether they don't:
SOMETHING-ELSE: value
MORE-COLONED-LINES: here
For now, I'll assume you want to keep everything except lines that 
start with "ADDRESS...:" or "EMAIL:"

Of course, this is still making lots of assumptions. Other possible interpretations are that you want the first six lines of any file to be deleted, whatever the text. And it might be that the “SOMETHING-ELSE:” I indicated in the example text might also be “ADDRESS:”, in which case we’d have to tweak my regex to limit those matches to the first lines of a file, because mine assumes that any lines starting with “ADDRESS…:” or “EMAIL:” will be deleted.

It would be easier to help you if you’d give all the information we need at once, rather than doling it out piecemeal. As explained below, a good example would have examples of lines to match and lines not to match, and would show us both the before and after. A good example will also be properly formatted using Markdown (like my example was) – links to Markdown help and regex help are in the boilerplate below.

-----
FYI: I often add this to my response in regex threads, unless I am sure the original poster has seen it before. Here is some helpful information for finding out more about regular expressions, and for formatting posts in this forum (especially quoting data) so that we can fully understand what you’re trying to ask:

This forum is formatted using Markdown, with a help link buried on the little grey ? in the COMPOSE window/pane when writing your post. For more about how to use Markdown in this forum, please see @Scott-Sumner’s post in the “how to markdown code on this forum” topic, and my updates near the end. It is very important that you use these formatting tips – using single backtick marks around small snippets, and using code-quoting for pasting multiple lines from your example data files – because otherwise, the forum will change normal quotes ("") to curly “smart” quotes (“”), will change hyphens to dashes, will sometimes hide asterisks (or if your text is c:\folder\*.txt, it will show up as c:\folder*.txt, missing the backslash). If you want to clearly communicate your text data to us, you need to properly format it.

If you have further search-and-replace (“matching”, “marking”, “bookmarking”, regular expression, “regex”) needs, study this FAQ and the documentation it points to. Before asking a new regex question, understand that for future requests, many of us will expect you to show what data you have (exactly), what data you want (exactly), what regex you already tried (to show that you’re showing effort), why you thought that regex would work (to prove it wasn’t just something randomly typed), and what data you’re getting with an explanation of why that result is wrong. When you show that effort, you’ll see us bend over backward to get things working for you. If you need help formatting, see the paragraph above.

Please note that for all regex and related queries, it is best if you are explicit about what needs to match, and what shouldn’t match, and have multiple examples of both in your example dataset. Often, what shouldn’t match helps define the regular expression as much or more than what should match.

Francisco

@PeterJones said:

(?-s)^(?:ADDRESS(-.?)|EMAIL):.*?(?:\R|\Z)

(? -s) ^ (?: ADDRESS (-. *?) * | EMAIL):. *? (?: \ R | \ Z)
This command worked perfectly on all files in a given folder. All lines started by ADDRESS and EMAIL were automatically deleted as desired.
I am very pleased and grateful for this important help.
Only three files did not have their email deleted, because the email line does not have the word EMAIL at the beginning of the line.
P.S. I do not know if it would be possible in this command to include the search for any line that contains the @

Nicholas Wetzel

@PeterJones said:

@Nicholas-Wetzel: Welcome to the Notepad++ Community.

Example of lines I want to keep:
Example of lines I want to delete:

Thank you for clearly specifying both. That helps us help you.

Using the regex ^.*:.{1,7}(\R+|\z) to find, with replace being empty, should delete those lines

Mind checking my new thread here please?

https://notepad-plus-plus.org/community/topic/18149/sorting-login-information

Hoang Ngoc

@PeterJones
Hello sir
I need help in notepad++, really appreciated
List:
kkkkk:123456
kkkkk:aaaaaa
kkkkk:a123456
kkkk:123456a
Examples of lines I want to delete:
kkkkk:123456
kkkkk:aaaaaa
Delete all line after “:” have only numbers or letter

PeterJones

@Hoang-Ngoc

With data:

kkkkk:123456
kkkkk:aaaaaa
kkkkk:a123456
kkkk:123456a
kkkkk:zzzzz

FIND = (?-s)^.*:([[:alpha:]]+|[[:digit:]]+)(\R|\z)
REPLACE = empty
SEARCH MODE = regular expression
yields

kkkkk:a123456
kkkk:123456a

The logic I used: you wanted to delete the whole line, so I had to start with “from the start of the line, any character”; you said it came after a colon, so “followed by a colon”; then “followed by either a group of all letters or a group of all numbers”, then “followed by the end of the line (or end of the file)”. I then translated those into regex tokens.

-—

Please note: This Community Forum is not a data transformation service; you should not expect to be able to always say “I have data like X and want it to look like Y” and have us do all the work for you. If you are new to the Forum, and new to regular expressions, we will often give help on the first one or two data-transformation questions, especially if they are well-asked and you show a willingness to learn; and we will point you to the documentation where you can learn how to do the data transformations for yourself in the future. But if you repeatedly ask us to do your work for you, you will find that the patience of usually-helpful Community members wears thin. The best way to learn regular expressions is by experimenting with them yourself, and getting a feel for how they work; having us spoon-feed you the answers without you putting in the effort doesn’t help you in the long term and is uninteresting and annoying for us.

-—

Do you want regex search/replace help? Then please be patient and polite, show some effort, and be willing to learn; answer questions and requests for clarification that are made of you. All example text should be marked as literal text using the </> toolbar button or manual Markdown syntax. To make regex in red (and so they keep their special characters like *), use backticks, like `^.*?blah.*?\z`. Screenshots can be pasted from the clipboard to your post using Ctrl+V to show graphical items, but any text should be included as literal text in your post so we can easily copy/paste your data. Show the data you have and the text you want to get from that data; include examples of things that should match and be transformed, and things that don’t match and should be left alone; show edge cases and make sure you examples are as varied as your real data. Show the regex you already tried, and why you thought it should work; tell us what’s wrong with what you do get. Read the official NPP Searching / Regex docs and the forum’s Regular Expression FAQ. If you follow these guidelines, you’re much more likely to get helpful replies that solve your problem in the shortest number of tries.

Hoang Ngoc

This post is deleted!

Hoang Ngoc

@PeterJones

What about before “:”, my website request “Username can only contain the allowed characters: uppercase letters, lowercase letters, numbers (a-z, A-Z, 0-9), underscores, dashes and periods. Username must begin or end with a letter or number and must contain at least one letter.” and “Account name must have 6-15 characters”
I wanna delete line not follow the rule

PeterJones

@Hoang-Ngoc said in How to mark lines with under "x" characters after : in a line.:

What about before “:”, my website request “Username can only contain the allowed characters: uppercase letters, lowercase letters, numbers (a-z, A-Z, 0-9), underscores, dashes and periods. Username must begin or end with a letter or number and must contain at least one letter.” and “Account name must have 6-15 characters”

and then you deleted that and wrote

I wanna delete line not follow the rule

Well, that changes things. Thanks for wasting my time while I was writing up deleting everything that didn’t follow that rule. I’ll edit what I was in the middle of…

-----

The least you can do is ask complete questions and at least attempt to make your posts make sense (for example, the preview window should have showed you that it was rendering your new text as if it were part of my quoted message, before you deleted it)

As I said earlier, this forum is not a data transformation service. So you’ll get one more freebie from me. But you’ve got to try to put more effort in if you’re going to be asking people for help. If you want to do many search-and-replace, you’re going to have to read the official Notepad++ regular expression docs, which I already linked for you before, and have now linked again.

To allow uppercase, lowercase, numbers, underscores, dashes, periods, you can use the [a-zA-Z0-9_.-] . To indicate a specific quantity, you can use {N,M}, where N and M are the range you want to allow. For the more restrictive letter-or-number only for the first and last charcter, use [a-zA-Z0-9] without the other characters. Put that all together: since you want a restrictive followed by N-M less restrictive, followed by a restrictive, the N-M will need to be a range that is two less than the actually-allowed range, so 4-13. Thus, [a-zA-Z0-9][a-zA-Z0-9_.-]{4,13}[a-zA-Z0-9]. And, as before, you need a start-of-line anchor, and want to have the colon after. ~~But this is what’s allowed, and you want to delete what’s not allowed.~~ Since you now want to delete any that match the rules, that’s slightly easier.

FIND = (?-s)^[a-zA-Z0-9][a-zA-Z0-9_.-]{4,13}[a-zA-Z0-9]:.*(\R|\z)

Actually, that almost did it.

short:blah123
thisIs2good:blah123
toooverlylongouidiot:blah123
bad'character:blah123
ok-char:blah123
1234-6789:blah123
-badStart:blah123
badEnd_:blah123
1ok_again2:blah123

becomes

short:blah123
toooverlylongouidiot:blah123
bad'character:blah123
-badStart:blah123
badEnd_:blah123

You’ll notice that username=1234-6789 line was deleted, even though it didn’t contain at least one letter. That’s because getting the “at least one letter” is hard. So I want to handle that separately.

Before doing the regex shown above, do a FIND = ^[0-9_.-]{6,15}:.*$ and REPLACE=!KEEPME!$0, which will give an intermediate:

short:blah123
thisIs2good:blah123
toooverlylongouidiot:blah123
bad'character:blah123
ok-char:blah123
!KEEPME!1234-6789:blah123
-badStart:blah123
badEnd_:blah123
1ok_again2:blah123

Now do the one I showed earlier: (?-s)^[a-zA-Z0-9][a-zA-Z0-9_.-]{4,13}[a-zA-Z0-9]:.*(\R|\z) =>

short:blah123
toooverlylongouidiot:blah123
bad'character:blah123
!KEEPME!1234-6789:blah123
-badStart:blah123
badEnd_:blah123

Now do FIND = ^!KEEPME! and REPLACE = empty to get rid of that indicator.

short:blah123
toooverlylongouidiot:blah123
bad'character:blah123
1234-6789:blah123
-badStart:blah123
badEnd_:blah123

Now you only show the usernames that violate your rules.

Hoang Ngoc

@PeterJones

Thank you so much, i really appreciate what you are doing for this community, keep it up

guy038

Hello, @hoang-ngoc, @peterjones and All,

The following single search regex could be used and, with an empty replace field, would delete any line with a valid user-name :

SEARCH (?i-s)(?=^[a-z0-9])(?=.*[a-z0-9]:)(?=.*[a-z].*:)^[a-z0-9_.-]{6,15}:.*\R?

REPLACE Leave EMPTY

Notes :

The (?i-s) forces an insensitive search process and the regex dot . standing for a single standard character
Then the main part is ^[a-z0-9_.-]{6,15}:.*\R? which searches for 6 to 15 chars, before a colon which can be, either, a standard letter or digit, an underscore, a period or a dash, followed by the remainder of current line and a possible line_break
This part will be valid ONLY IF, in addition, these three lookaheads are TRUE, at beginning of current line :
- A letter or digit begins the user-name ( part (?=^[a-z0-9]) )
- A letter or digit ends the user-name ( part (?=.*[a-z0-9]:) )
- The user-name contains, at least, ONE letter ( part (?=.*[a-z].*:) )

So, given this INPUT text :

short:••••••••                      #  < 6 chars
ThisIs2good:••••••••                #  OK
Looong_user-name:••••••••           #  > 15 chars
us@er'NA=ME:••••••••                #  NON-VALID chars
ok-chr:••••••••                     #  OK  (  7 chars and ALL chars ALLOWED )
ABCD-FGHI_12.34:••••••••            #  OK  ( 15 chars and ALL chars ALLOWED )
1234-6789:••••••••                  #  NO letter
.User-Name:••••••••                 #  NON-VALID char at START
USER.NAME_:••••••••                 #  NON-VALID char at END
1ok_again2:••••••••                 #  OK

After the replacment, it would remain :the following OUTPUT text :

short:••••••••                      #  < 6 chars
Looong_user-name:••••••••           #  > 15 chars
us@er'NA=ME:••••••••                #  NON-VALID chars
1234-6789:••••••••                  #  NO letter
.User-Name:••••••••                 #  NON-VALID char at START
USER.NAME_:••••••••                 #  NON-VALID char at END

Best regards

guy038