Community
    • Login

    How to mark lines with under "x" characters after : in a line.

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    21 Posts 8 Posters 8.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • FranciscoF
      Francisco
      last edited by Francisco

      @Alan Kilborn, thanks, good morning everyone, I was successful using ADDRESS (? - s), selecting only the lines that start with ADDRESS. To exclude them, mark all and exclude the marked lines.
      Is it possible to perform this operation on all open files? I can only find, I can not mark them

      Alan KilbornA 1 Reply Last reply Reply Quote 2
      • Alan KilbornA
        Alan Kilborn @Francisco
        last edited by

        @Francisco

        You are correct; you cannot bookmark more than one file per marking operation.

        It isn’t clear to me what your real goal is exactly but it appears to be a deletion operation. I think it is likely that this can be done totally with a regular expression replacement and not a combo of regex marking followed by boomarked lines manipulation.

        FranciscoF 1 Reply Last reply Reply Quote 2
        • FranciscoF
          Francisco @Alan Kilborn
          last edited by

          @Alan-Kilborn thanks…
          What I need:
          I have 100 text files, with the same format, each with several lines.
          6 of these lines, are present in all files and start like this:
          ADDRESS:
          ADDRESS-CITY: Christmas
          ADDRESS-STATE-PROVINCE: RN
          ADDRESS-POSTALCODE: 59054550
          ADDRESS-COUNTRY: BRAZIL
          EMAIL: mjnhx@globo.com
          I need to easily delete the 6 lines above.

          1 Reply Last reply Reply Quote 0
          • PeterJonesP
            PeterJones
            last edited by PeterJones

            @Francisco said:

            I have 100 text files, with the same format, each with several lines.
            6 of these lines, are present in all files and start like this:

            The problem is, you’ve already rejected our solutions (or, at least, you keep on asking, so we have to assume your problem isn’t solved), but have shown nothing that indicates why what we’ve given doesn’t work for you. One reason for this is explained in my boilerplate below (after the dashed line).

            That said, maybe you’re just unsure how to combine @Alan-Kilborn’s fix to my regex, and then have it actually do the deletion, rather than just highlighting. If that’s the case, then it’s simple. I’ll also tweak my portion, because you have now indicated that it should also delete EMAIL, which wasn’t anywhere in your original problem statement.

            • Find What: (?-s)^(?:ADDRESS(-.*?)*|EMAIL):.*?(?:\R|\Z)
              • (?-s): don’t have . match newline
              • ^: match starts at beginning of line
              • (?:...): make a group, but don’t give it a number
              • ADDRESS(-.*?)*: match the word “ADDRESS”, possibly followed by one or more hyphens, possibly followed by other characters
              • |: the OR operator – will match what is before or what is after
              • EMAIL: the word EMAIL
              • :: that group of ADDRESS or EMAIL must be immediately followed by a colon to match
              • .*?: match the remaining characters on the line
              • (?:\R|\Z): another unnumbered group, this time containing a NEWLINE sequence (\R = CR, LF, or CRLF) or end-of-file (\Z).
            • Replace With: empty
              • this will delete the whole line matched above, including the newline
            • Mode = regular expression

            I recommend getting the expression working with one file; once that works, then you can move on to using the Find in Files for all your files.

            With those settings, this block of text:

            ADDRESS:
            ADDRESS-CITY: Christmas
            ADDRESS-STATE-PROVINCE: RN
            ADDRESS-POSTALCODE: 59054550
            ADDRESS-COUNTRY: BRAZIL
            EMAIL: mjnhx@globo.com
            You tell us nothing about the remainder of the file, so I don't know whether
            the following lines match your pattern, or whether they don't:
            SOMETHING-ELSE: value
            MORE-COLONED-LINES: here
            For now, I'll assume you want to keep everything except lines that 
            start with "ADDRESS...:" or "EMAIL:"
            

            would be edited to:

            You tell us nothing about the remainder of the file, so I don't know whether
            the following lines match your pattern, or whether they don't:
            SOMETHING-ELSE: value
            MORE-COLONED-LINES: here
            For now, I'll assume you want to keep everything except lines that 
            start with "ADDRESS...:" or "EMAIL:"
            

            Of course, this is still making lots of assumptions. Other possible interpretations are that you want the first six lines of any file to be deleted, whatever the text. And it might be that the “SOMETHING-ELSE:” I indicated in the example text might also be “ADDRESS:”, in which case we’d have to tweak my regex to limit those matches to the first lines of a file, because mine assumes that any lines starting with “ADDRESS…:” or “EMAIL:” will be deleted.

            It would be easier to help you if you’d give all the information we need at once, rather than doling it out piecemeal. As explained below, a good example would have examples of lines to match and lines not to match, and would show us both the before and after. A good example will also be properly formatted using Markdown (like my example was) – links to Markdown help and regex help are in the boilerplate below.

            -----
            FYI: I often add this to my response in regex threads, unless I am sure the original poster has seen it before. Here is some helpful information for finding out more about regular expressions, and for formatting posts in this forum (especially quoting data) so that we can fully understand what you’re trying to ask:

            This forum is formatted using Markdown, with a help link buried on the little grey ? in the COMPOSE window/pane when writing your post. For more about how to use Markdown in this forum, please see @Scott-Sumner’s post in the “how to markdown code on this forum” topic, and my updates near the end. It is very important that you use these formatting tips – using single backtick marks around small snippets, and using code-quoting for pasting multiple lines from your example data files – because otherwise, the forum will change normal quotes ("") to curly “smart” quotes (“”), will change hyphens to dashes, will sometimes hide asterisks (or if your text is c:\folder\*.txt, it will show up as c:\folder*.txt, missing the backslash). If you want to clearly communicate your text data to us, you need to properly format it.

            If you have further search-and-replace (“matching”, “marking”, “bookmarking”, regular expression, “regex”) needs, study this FAQ and the documentation it points to. Before asking a new regex question, understand that for future requests, many of us will expect you to show what data you have (exactly), what data you want (exactly), what regex you already tried (to show that you’re showing effort), why you thought that regex would work (to prove it wasn’t just something randomly typed), and what data you’re getting with an explanation of why that result is wrong. When you show that effort, you’ll see us bend over backward to get things working for you. If you need help formatting, see the paragraph above.

            Please note that for all regex and related queries, it is best if you are explicit about what needs to match, and what shouldn’t match, and have multiple examples of both in your example dataset. Often, what shouldn’t match helps define the regular expression as much or more than what should match.

            FranciscoF 1 Reply Last reply Reply Quote 2
            • FranciscoF
              Francisco @PeterJones
              last edited by Francisco

              @PeterJones said:

              (?-s)^(?:ADDRESS(-.?)|EMAIL):.*?(?:\R|\Z)

              (? -s) ^ (?: ADDRESS (-. *?) * | EMAIL):. *? (?: \ R | \ Z)
              This command worked perfectly on all files in a given folder. All lines started by ADDRESS and EMAIL were automatically deleted as desired.
              I am very pleased and grateful for this important help.
              Only three files did not have their email deleted, because the email line does not have the word EMAIL at the beginning of the line.
              P.S. I do not know if it would be possible in this command to include the search for any line that contains the @

              1 Reply Last reply Reply Quote 1
              • Nicholas WetzelN
                Nicholas Wetzel @PeterJones
                last edited by

                @PeterJones said:

                @Nicholas-Wetzel: Welcome to the Notepad++ Community.

                Example of lines I want to keep:
                Example of lines I want to delete:

                Thank you for clearly specifying both. That helps us help you.

                Using the regex ^.*:.{1,7}(\R+|\z) to find, with replace being empty, should delete those lines

                Mind checking my new thread here please?

                https://notepad-plus-plus.org/community/topic/18149/sorting-login-information

                1 Reply Last reply Reply Quote 0
                • Hoang NgocH
                  Hoang Ngoc @PeterJones
                  last edited by

                  @PeterJones
                  Hello sir
                  I need help in notepad++, really appreciated
                  List:
                  kkkkk:123456
                  kkkkk:aaaaaa
                  kkkkk:a123456
                  kkkk:123456a
                  Examples of lines I want to delete:
                  kkkkk:123456
                  kkkkk:aaaaaa
                  Delete all line after “:” have only numbers or letter

                  PeterJonesP 1 Reply Last reply Reply Quote 0
                  • PeterJonesP
                    PeterJones @Hoang Ngoc
                    last edited by

                    @Hoang-Ngoc

                    With data:

                    kkkkk:123456
                    kkkkk:aaaaaa
                    kkkkk:a123456
                    kkkk:123456a
                    kkkkk:zzzzz
                    

                    FIND = (?-s)^.*:([[:alpha:]]+|[[:digit:]]+)(\R|\z)
                    REPLACE = empty
                    SEARCH MODE = regular expression
                    yields

                    kkkkk:a123456
                    kkkk:123456a
                    

                    The logic I used: you wanted to delete the whole line, so I had to start with “from the start of the line, any character”; you said it came after a colon, so “followed by a colon”; then “followed by either a group of all letters or a group of all numbers”, then “followed by the end of the line (or end of the file)”. I then translated those into regex tokens.

                    ----

                    Please note: This Community Forum is not a data transformation service; you should not expect to be able to always say “I have data like X and want it to look like Y” and have us do all the work for you. If you are new to the Forum, and new to regular expressions, we will often give help on the first one or two data-transformation questions, especially if they are well-asked and you show a willingness to learn; and we will point you to the documentation where you can learn how to do the data transformations for yourself in the future. But if you repeatedly ask us to do your work for you, you will find that the patience of usually-helpful Community members wears thin. The best way to learn regular expressions is by experimenting with them yourself, and getting a feel for how they work; having us spoon-feed you the answers without you putting in the effort doesn’t help you in the long term and is uninteresting and annoying for us.

                    ----

                    Do you want regex search/replace help? Then please be patient and polite, show some effort, and be willing to learn; answer questions and requests for clarification that are made of you. All example text should be marked as literal text using the </> toolbar button or manual Markdown syntax. To make regex in red (and so they keep their special characters like *), use backticks, like `^.*?blah.*?\z`. Screenshots can be pasted from the clipboard to your post using Ctrl+V to show graphical items, but any text should be included as literal text in your post so we can easily copy/paste your data. Show the data you have and the text you want to get from that data; include examples of things that should match and be transformed, and things that don’t match and should be left alone; show edge cases and make sure you examples are as varied as your real data. Show the regex you already tried, and why you thought it should work; tell us what’s wrong with what you do get. Read the official NPP Searching / Regex docs and the forum’s Regular Expression FAQ. If you follow these guidelines, you’re much more likely to get helpful replies that solve your problem in the shortest number of tries.

                    Hoang NgocH 1 Reply Last reply Reply Quote 1
                    • Hoang NgocH
                      Hoang Ngoc
                      last edited by

                      This post is deleted!
                      PeterJonesP 1 Reply Last reply Reply Quote 0
                      • Hoang NgocH
                        Hoang Ngoc @PeterJones
                        last edited by

                        @PeterJones

                        What about before “:”, my website request “Username can only contain the allowed characters: uppercase letters, lowercase letters, numbers (a-z, A-Z, 0-9), underscores, dashes and periods. Username must begin or end with a letter or number and must contain at least one letter.” and “Account name must have 6-15 characters”
                        I wanna delete line not follow the rule

                        1 Reply Last reply Reply Quote 0
                        • PeterJonesP
                          PeterJones @Hoang Ngoc
                          last edited by

                          @Hoang-Ngoc said in How to mark lines with under "x" characters after : in a line.:

                          What about before “:”, my website request “Username can only contain the allowed characters: uppercase letters, lowercase letters, numbers (a-z, A-Z, 0-9), underscores, dashes and periods. Username must begin or end with a letter or number and must contain at least one letter.” and “Account name must have 6-15 characters”

                          and then you deleted that and wrote

                          I wanna delete line not follow the rule

                          Well, that changes things. Thanks for wasting my time while I was writing up deleting everything that didn’t follow that rule. I’ll edit what I was in the middle of…

                          -----

                          The least you can do is ask complete questions and at least attempt to make your posts make sense (for example, the preview window should have showed you that it was rendering your new text as if it were part of my quoted message, before you deleted it)

                          As I said earlier, this forum is not a data transformation service. So you’ll get one more freebie from me. But you’ve got to try to put more effort in if you’re going to be asking people for help. If you want to do many search-and-replace, you’re going to have to read the official Notepad++ regular expression docs, which I already linked for you before, and have now linked again.

                          To allow uppercase, lowercase, numbers, underscores, dashes, periods, you can use the [a-zA-Z0-9_.-] . To indicate a specific quantity, you can use {N,M}, where N and M are the range you want to allow. For the more restrictive letter-or-number only for the first and last charcter, use [a-zA-Z0-9] without the other characters. Put that all together: since you want a restrictive followed by N-M less restrictive, followed by a restrictive, the N-M will need to be a range that is two less than the actually-allowed range, so 4-13. Thus, [a-zA-Z0-9][a-zA-Z0-9_.-]{4,13}[a-zA-Z0-9]. And, as before, you need a start-of-line anchor, and want to have the colon after. But this is what’s allowed, and you want to delete what’s not allowed. Since you now want to delete any that match the rules, that’s slightly easier.

                          FIND = (?-s)^[a-zA-Z0-9][a-zA-Z0-9_.-]{4,13}[a-zA-Z0-9]:.*(\R|\z)

                          Actually, that almost did it.

                          short:blah123
                          thisIs2good:blah123
                          toooverlylongouidiot:blah123
                          bad'character:blah123
                          ok-char:blah123
                          1234-6789:blah123
                          -badStart:blah123
                          badEnd_:blah123
                          1ok_again2:blah123
                          

                          becomes

                          short:blah123
                          toooverlylongouidiot:blah123
                          bad'character:blah123
                          -badStart:blah123
                          badEnd_:blah123
                          

                          You’ll notice that username=1234-6789 line was deleted, even though it didn’t contain at least one letter. That’s because getting the “at least one letter” is hard. So I want to handle that separately.

                          Before doing the regex shown above, do a FIND = ^[0-9_.-]{6,15}:.*$ and REPLACE=!KEEPME!$0, which will give an intermediate:

                          short:blah123
                          thisIs2good:blah123
                          toooverlylongouidiot:blah123
                          bad'character:blah123
                          ok-char:blah123
                          !KEEPME!1234-6789:blah123
                          -badStart:blah123
                          badEnd_:blah123
                          1ok_again2:blah123
                          

                          Now do the one I showed earlier: (?-s)^[a-zA-Z0-9][a-zA-Z0-9_.-]{4,13}[a-zA-Z0-9]:.*(\R|\z) =>

                          short:blah123
                          toooverlylongouidiot:blah123
                          bad'character:blah123
                          !KEEPME!1234-6789:blah123
                          -badStart:blah123
                          badEnd_:blah123
                          

                          Now do FIND = ^!KEEPME! and REPLACE = empty to get rid of that indicator.

                          short:blah123
                          toooverlylongouidiot:blah123
                          bad'character:blah123
                          1234-6789:blah123
                          -badStart:blah123
                          badEnd_:blah123
                          

                          Now you only show the usernames that violate your rules.

                          Hoang NgocH 1 Reply Last reply Reply Quote 0
                          • Hoang NgocH
                            Hoang Ngoc @PeterJones
                            last edited by

                            @PeterJones

                            Thank you so much, i really appreciate what you are doing for this community, keep it up

                            1 Reply Last reply Reply Quote 0
                            • guy038G
                              guy038
                              last edited by

                              Hello, @hoang-ngoc, @peterjones and All,

                              The following single search regex could be used and, with an empty replace field, would delete any line with a valid user-name :

                              SEARCH (?i-s)(?=^[a-z0-9])(?=.*[a-z0-9]:)(?=.*[a-z].*:)^[a-z0-9_.-]{6,15}:.*\R?

                              REPLACE Leave EMPTY


                              Notes :

                              • The (?i-s) forces an insensitive search process and the regex dot . standing for a single standard character

                              • Then the main part is ^[a-z0-9_.-]{6,15}:.*\R? which searches for 6 to 15 chars, before a colon which can be, either, a standard letter or digit, an underscore, a period or a dash, followed by the remainder of current line and a possible line_break

                              • This part will be valid ONLY IF, in addition, these three lookaheads are TRUE, at beginning of current line :

                                • A letter or digit begins the user-name ( part (?=^[a-z0-9]) )

                                • A letter or digit ends the user-name ( part (?=.*[a-z0-9]:) )

                                • The user-name contains, at least, ONE letter ( part (?=.*[a-z].*:) )


                              So, given this INPUT text :

                              short:••••••••                      #  < 6 chars
                              ThisIs2good:••••••••                #  OK
                              Looong_user-name:••••••••           #  > 15 chars
                              us@er'NA=ME:••••••••                #  NON-VALID chars
                              ok-chr:••••••••                     #  OK  (  7 chars and ALL chars ALLOWED )
                              ABCD-FGHI_12.34:••••••••            #  OK  ( 15 chars and ALL chars ALLOWED )
                              1234-6789:••••••••                  #  NO letter
                              .User-Name:••••••••                 #  NON-VALID char at START
                              USER.NAME_:••••••••                 #  NON-VALID char at END
                              1ok_again2:••••••••                 #  OK
                              

                              After the replacment, it would remain :the following OUTPUT text :

                              short:••••••••                      #  < 6 chars
                              Looong_user-name:••••••••           #  > 15 chars
                              us@er'NA=ME:••••••••                #  NON-VALID chars
                              1234-6789:••••••••                  #  NO letter
                              .User-Name:••••••••                 #  NON-VALID char at START
                              USER.NAME_:••••••••                 #  NON-VALID char at END
                              

                              Best regards

                              guy038

                              1 Reply Last reply Reply Quote 1
                              • First post
                                Last post
                              The Community of users of the Notepad++ text editor.
                              Powered by NodeBB | Contributors