Community

    • Login
    • Search
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search

    Find multiple lines in files

    General Discussion
    find files
    5
    13
    1374
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • guy038
      guy038 last edited by guy038

      Hello, @joel-linderberg, @alan-kilborn, and All,

      Let’s suppose that the two strings, which determine the area to match, are AAAAA and BBBBB

      So, Joel, if you’re sure that string AAAAA comes always before BBBBB, even several times, choose the Alan’s regex :

      (?s)AAAAA.+?BBBBB

      which looks for the smallest range of any characters, even on several lines, between the strings AAAAA and BBBBB

      • The modifier (?s) just means that the dot character matches any single char ( Standard and EOL one )

      • The +? syntax represents the lazy quantifier, meaning as less as possible, from 1 char to any number of chars

      • The + syntax, which represents the greedy quantifier, would have matched as much as possible, from any number of chars to 1 char

      Just a remark, Alan, the exact regex should be : (?s)Material: Copper.*?Diameter: 2\.80 ;-))


      Now, if, on the contrary, the string BBBBB comes always before AAAAA, choose, of course, the regex :

      (?s)BBBBB.+?AAAAA

      If you’re not sure, a third possibility could use the following regex, which looks for the two cases, simultaneously :

      (?s)(AAAAA).+?(BBBBB)|(?2).+?(?1)

      • The string AAAAA is stored as Group 1

      • The string BBBBB is stored as Group 2

      • The (?1) and (?2) are routine calls to these regexes which, in our case, are, simply, literal strings

      • The | character is the alternation symbol


      Now, the drawback of that last regex is that, if you decide to start that search, on current file, from present cursor location, it may wrongly catch areas of text, as below :

      .....AAAAA......AAAAA......BBBBB...... OR .....BBBBB......BBBBB......AAAAA......

      To prevent that case, use this fourth regex :

      (?s)(AAAAA)(?:(?!(?1)).)+?(BBBBB)|(?2)(?:(?!(?2)).)+?(?1)

      However, beware that the negative look-head structures (?!(?1)) and (?!(?2)) consumes system resources and sometimes end up to a catastrophic break-down of the N++ regex engine ! Especially, when you’re dealing with huge files and/or if a great amount of text lies between the two boundaries AAAAA and BBBBB :-((

      Note, also, that the (?:.......) syntax introduces a non-capturing group, generally followed with a quantifier symbol

      Best Regards,

      guy038

      1 Reply Last reply Reply Quote 2
      • Joel Lindenberg
        Joel Lindenberg last edited by

        Entered as such:
        (?s)MATERIAL: ALLOY 625.*?DIAMETER: .840
        And found nothing.
        FIRST LINE I am looking for is MATERIAL: ALLOY 625 (Does spaces matter? Do number of spaces matter?)
        SECOND LINE I am looking for is DIAMETER: .840 (Do number of spaces matter between DIAMETER and .840 matter? Is there certain syntax required if number is 2.50 or .840?)
        Sequence of entries are always fixed.

        1 Reply Last reply Reply Quote 0
        • PeterJones
          PeterJones last edited by

          Yes, the number of spaces must match exactly the number in the text, or it will not match.

          The forum may have eaten your \ in your regex: the example you were copying from made sure to escape the decimal point, but I don’t see that backslash before the period in yours.

          I am pasting a boilerplate below: it will direct you to help on search/replace (regular expression aka regex) documentation, as well as pointing to a post which describes how to format text such that regular expressions or example text show up as you pasted them, rather than with the forum trying to interpret them as special markdown characters.

          FYI: if you have further search-and-replace (regex) needs, study this FAQ and the documentation it points to. Before asking a new regex question, understand that for future requests, many of us will expect you to show what data you have (exactly), what data you want (exactly), what regex you already tried (to show that you’re showing effort), why you thought that regex would work (to prove it wasn’t just something randomly typed), and what data you’re getting with an explanation of why that result is wrong. When you show that effort, you’ll see us bend over backward to get things working for you. If you need help formatting the data so that the forum doesn’t mangle it (so that it shows “exactly”, as I said earlier), see this help-with-markdown post, where @Scott-Sumner gives a great summary of how to use Markdown for this forum’s needs.
          Please note that for all “regex” queries – or queries where you want help “matching” or “marking” or “bookmarking” a certain pattern, which amounts to the same thing – it is best if you are explicit about what needs to match, and what shouldn’t match, and have multiple examples of both in your example dataset. Often, what shouldn’t match helps define the regular expression as much or more than what should match.

          Assuming you have the exact regular expression (?s)MATERIAL: ALLOY 625.*?DIAMETER: .840 that I can see in your post, with a source file of

          MATERIAL: ALLOY 625
          you implied more lines can go here
          DIAMETER: .840
          

          then the regular expression matches that entire first block of text.

          If you also had

          MATERIAL: ALLOY 625
          you implied more lines can go here
          DIAMETER: .840
          more separators
          MATERIAL: ALLOY 625
          you implied more lines can go here
          DIAMETER: x840
          

          then the regex as quoted would show two matches, rather than one, because x matches the regex special character .. To limit it to just match the first group, the regex would need to be (?s)MATERIAL: ALLOY 625.*?DIAMETER: \.840, \. in a regular expression tells the engine to treat the . a literal character, instead of a special-regex character

          This file:

          MATERIAL: ALLOY 625
          you implied more lines can go here
          DIAMETER: 2.50
          

          will not match, because we’re looking for \.850 in this variant that you showed. Changing the regex to (?s)MATERIAL: ALLOY 625.*?DIAMETER: 2\.50 will match, however.

          Since you said “find in files”, which implies multiple files, are you looking for files that all have exactly the same ALLOY number followed by exactly the same DIAMETER number? If so, why did you change the numbers on us in your second example vs your first. It’s highly confusing when you change expectations without explaining why.

          It might help if you gave us more short example files (3-4lines like I used) – a couple that you think should match, and a couple that don’t – as well as the regular expression you think should match.

          For example, I might say, " I want this regular expression (?s)MATERIAL: ALLOY 625.*?DIAMETER: .840 to match

          MATERIAL: ALLOY 625
          this file should match
          DIAMETER: .840
          

          but not match

          MATERIAL: ALLOY 625
          this file should not match
          DIAMETER: 1840
          

          … but the regular expression matches both files. Why?"

          Then someone’s answer would be “because the period in the regular expression matches any character, not just the literal period character”

          -----
          Oh, I see that Scott’s post doesn’t include the alternate file-quoting method, which I used above. in a few minutes, I will add an example at the end of that [thread]https://notepad-plus-plus.org/community/topic/14262/how-to-markdown-code-on-this-forum/7)

          1 Reply Last reply Reply Quote 0
          • Joel Lindenberg
            Joel Lindenberg last edited by

            I can have many entries. I entered what I could remember at home the other day. Reality is what I tried.
            Yes , several lines could be between. In fact, I may also need to add 2 more finds too. And yes, I am looking in an entire folder of files. Hope that helps. And putting the explanations of the answers here really helps. I am not a coder. Just trying to debug some failures from some program. Thanks ALL

            1 Reply Last reply Reply Quote 0
            • PeterJones
              PeterJones last edited by

              Unfortunately, we cannot help you generalize any more until you give us more examples of mini-files that should or should not match, similar to what I did above. It doesn’t have to be real data, just make it up, like I did. But right now, we don’t know if you want it to match any files that have at least one “MATERIAL: anything” followed some time later by “DIAMETER: specific-number”, or followed by “DIAMETER: any-floating-point-number”.

              For example, which of these should match?

              MATERIAL: xyz
              blah
              DIAMETER: .840
              

              or

              MATERIAL: xyz
              blah
              DIAMETER: 0.840
              

              or

              MATERIAL: pdq
              blah
              DIAMETER: 1.840
              

              or

              MATERIAL: ALLOY 625
              blah
              DIAMETER: 2.50
              

              or

              MATERIAL: ALLOY UNNUMBERED
              blah
              DIAMETER: -.50
              
              MATERIAL: ALLOY UNNUMBERED
              blah
              DIAMETER: +.50
              

              or …

              What are the restrictions on the MATERIAL? What are the restrictions on the DIAMETER? Or are you just looking for any file that has a pair of MATERIAL: and DIAMETER:, and you don’t actually care what comes next on those lines.

              We cannot guess. We should not have to come up with these examples to ask you. If you want help, you have to provide the information to help.

              I will not be able to provide any more help or suggestions until you’ve given us more rules (detailed description, not handwavy), and preferrably giving us example files that should and should not match. Please format any future responses using the markup learned from the thread I linked earlier, especially using one of the two example methods (indenting or the ```z notation) for making sure your files come through exactly as pasted. Until you have shown that effort, there is no more I can do, sorry.

              Joel Lindenberg 1 Reply Last reply Reply Quote 1
              • Joel Lindenberg
                Joel Lindenberg @PeterJones last edited by

                @PeterJones I will try to enter more REAL examples, but I fear that might not be realistic. e.g. Material might be

                • MATERIAL: Copper
                • MATERIAL: NICKEL COPPER
                • MATERIAL: AL
                  And what I don’t know for sure are lines between, if that matters, or spaces before. But I am going to try to give a real life example tomorrow morning. Thank you for all of your help
                1 Reply Last reply Reply Quote 0
                • PeterJones
                  PeterJones last edited by

                  @Joel-Lindenberg,

                  Hopefully, after the example data, we’ll be able to get you a working solution.

                  1 Reply Last reply Reply Quote 0
                  • Joel Lindenberg
                    Joel Lindenberg last edited by

                    Thanks for your diligence and Patience.
                    Here are three snippets.
                    =SAMPLE 1====================================
                    BLAH:
                    BLAH:
                    BLAH:
                    DATE: FEBRUARY 30, 2002
                    DWG REV: -NA-
                    CRO REV: -NA-
                    DAT REV: -NA-
                    SHT REV: -NA-
                    DETAIL: BLAH
                    DYPN NUMBER: BLAH
                    MATERIAL: ALLOY 625
                    PIPE LENGTH: BLAH
                    DIAMETER: 2.375
                    THICKNESS: .109
                    BEND RAD: 6.000
                    BLAH:
                    BLAH:
                    =SAMPLE 2====================================
                    BLAH:
                    BLAH:
                    BLAH:
                    DATE: FEBRUARY 31, 2002
                    DWG REV: -NA-
                    CRO REV: -NA-
                    DAT REV: -NA-
                    SHT REV: -NA-
                    DETAIL: BLAH
                    DYPN NUMBER: BLAH
                    MATERIAL: NICKEL-COPPER
                    PIPE LENGTH: BLAH
                    DIAMETER: .840
                    THICKNESS: .109
                    BEND RAD: 2.500
                    BLAH:
                    BLAH:
                    =SAMPLE 3====================================
                    BLAH:
                    BLAH:
                    BLAH:
                    DATE: FEBRUARY 32, 2002
                    DWG REV: -NA-
                    CRO REV: -NA-
                    DAT REV: -NA-
                    SHT REV: -NA-
                    DETAIL: BLAH
                    DYPN NUMBER: BLAH
                    MATERIAL: CUNI 70:30
                    PIPE LENGTH: BLAH
                    DIAMETER: .840
                    THICKNESS: .120
                    BEND RAD: 2.500
                    BLAH:
                    BLAH:

                    1 Reply Last reply Reply Quote 1
                    • PeterJones
                      PeterJones last edited by

                      @Joel-Lindenberg ,

                      Thanks for the data.

                      Using that data, the regex (?s)MATERIAL:\h*\w+.*?DIAMETER:\h*[\.\d]+ finds three matches (I assume you intended for all three samples to match).

                      If you want details on that regex, see https://regexr.com/468dg, which I saved with that data and regex.

                      As a reminder, this FAQ will direct you to plenty of regex documentation for future search/replace/mark needs.

                      1 Reply Last reply Reply Quote 1
                      • V S Rawat
                        V S Rawat last edited by

                        take that file to excel
                        separate line at “:” so that variable name and value come in two columns
                        then sort that in excel so that all same variable name come together.

                        then do your thing.
                        combing two column to get a single line (reverse of what you did above)
                        bring file back to npp.

                        If you want original order, after taking the file to excel you can put a 3rd column and fill it will 1,2,3 serial no. for you entire data.
                        after first sorting, it will come to variable name order
                        so after finishing your data manipulation, sort again on this serial no. column and you get data in your original order.

                        thanks.

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        Copyright © 2014 NodeBB Forums | Contributors