Find multiple lines in files



  • I have one line that states Material: Copper. Another line says Diameter: 2.80. I need to find all files that have BOTH entries. How can I use Find In Files to perform this multiple FIND in MULTIPLE files?
    Thanks



  • @Joel-Lindenberg

    If they are always going to be in the stated order, then finding on (?s)Material: Copper.*?Diameter: 2.80 seems like it would work.



  • Of course, this would be a regular expression search.



  • Hello, @joel-linderberg, @alan-kilborn, and All,

    Let’s suppose that the two strings, which determine the area to match, are AAAAA and BBBBB

    So, Joel, if you’re sure that string AAAAA comes always before BBBBB, even several times, choose the Alan’s regex :

    (?s)AAAAA.+?BBBBB

    which looks for the smallest range of any characters, even on several lines, between the strings AAAAA and BBBBB

    • The modifier (?s) just means that the dot character matches any single char ( Standard and EOL one )

    • The +? syntax represents the lazy quantifier, meaning as less as possible, from 1 char to any number of chars

    • The + syntax, which represents the greedy quantifier, would have matched as much as possible, from any number of chars to 1 char

    Just a remark, Alan, the exact regex should be : (?s)Material: Copper.*?Diameter: 2\.80 ;-))


    Now, if, on the contrary, the string BBBBB comes always before AAAAA, choose, of course, the regex :

    (?s)BBBBB.+?AAAAA

    If you’re not sure, a third possibility could use the following regex, which looks for the two cases, simultaneously :

    (?s)(AAAAA).+?(BBBBB)|(?2).+?(?1)

    • The string AAAAA is stored as Group 1

    • The string BBBBB is stored as Group 2

    • The (?1) and (?2) are routine calls to these regexes which, in our case, are, simply, literal strings

    • The | character is the alternation symbol


    Now, the drawback of that last regex is that, if you decide to start that search, on current file, from present cursor location, it may wrongly catch areas of text, as below :

    .....AAAAA......AAAAA......BBBBB...... OR .....BBBBB......BBBBB......AAAAA......

    To prevent that case, use this fourth regex :

    (?s)(AAAAA)(?:(?!(?1)).)+?(BBBBB)|(?2)(?:(?!(?2)).)+?(?1)

    However, beware that the negative look-head structures (?!(?1)) and (?!(?2)) consumes system resources and sometimes end up to a catastrophic break-down of the N++ regex engine ! Especially, when you’re dealing with huge files and/or if a great amount of text lies between the two boundaries AAAAA and BBBBB :-((

    Note, also, that the (?:.......) syntax introduces a non-capturing group, generally followed with a quantifier symbol

    Best Regards,

    guy038



  • Entered as such:
    (?s)MATERIAL: ALLOY 625.*?DIAMETER: .840
    And found nothing.
    FIRST LINE I am looking for is MATERIAL: ALLOY 625 (Does spaces matter? Do number of spaces matter?)
    SECOND LINE I am looking for is DIAMETER: .840 (Do number of spaces matter between DIAMETER and .840 matter? Is there certain syntax required if number is 2.50 or .840?)
    Sequence of entries are always fixed.



  • Yes, the number of spaces must match exactly the number in the text, or it will not match.

    The forum may have eaten your \ in your regex: the example you were copying from made sure to escape the decimal point, but I don’t see that backslash before the period in yours.

    I am pasting a boilerplate below: it will direct you to help on search/replace (regular expression aka regex) documentation, as well as pointing to a post which describes how to format text such that regular expressions or example text show up as you pasted them, rather than with the forum trying to interpret them as special markdown characters.

    FYI: if you have further search-and-replace (regex) needs, study this FAQ and the documentation it points to. Before asking a new regex question, understand that for future requests, many of us will expect you to show what data you have (exactly), what data you want (exactly), what regex you already tried (to show that you’re showing effort), why you thought that regex would work (to prove it wasn’t just something randomly typed), and what data you’re getting with an explanation of why that result is wrong. When you show that effort, you’ll see us bend over backward to get things working for you. If you need help formatting the data so that the forum doesn’t mangle it (so that it shows “exactly”, as I said earlier), see this help-with-markdown post, where @Scott-Sumner gives a great summary of how to use Markdown for this forum’s needs.
    Please note that for all “regex” queries – or queries where you want help “matching” or “marking” or “bookmarking” a certain pattern, which amounts to the same thing – it is best if you are explicit about what needs to match, and what shouldn’t match, and have multiple examples of both in your example dataset. Often, what shouldn’t match helps define the regular expression as much or more than what should match.

    Assuming you have the exact regular expression (?s)MATERIAL: ALLOY 625.*?DIAMETER: .840 that I can see in your post, with a source file of

    MATERIAL: ALLOY 625
    you implied more lines can go here
    DIAMETER: .840
    

    then the regular expression matches that entire first block of text.

    If you also had

    MATERIAL: ALLOY 625
    you implied more lines can go here
    DIAMETER: .840
    more separators
    MATERIAL: ALLOY 625
    you implied more lines can go here
    DIAMETER: x840
    

    then the regex as quoted would show two matches, rather than one, because x matches the regex special character .. To limit it to just match the first group, the regex would need to be (?s)MATERIAL: ALLOY 625.*?DIAMETER: \.840, \. in a regular expression tells the engine to treat the . a literal character, instead of a special-regex character

    This file:

    MATERIAL: ALLOY 625
    you implied more lines can go here
    DIAMETER: 2.50
    

    will not match, because we’re looking for \.850 in this variant that you showed. Changing the regex to (?s)MATERIAL: ALLOY 625.*?DIAMETER: 2\.50 will match, however.

    Since you said “find in files”, which implies multiple files, are you looking for files that all have exactly the same ALLOY number followed by exactly the same DIAMETER number? If so, why did you change the numbers on us in your second example vs your first. It’s highly confusing when you change expectations without explaining why.

    It might help if you gave us more short example files (3-4lines like I used) – a couple that you think should match, and a couple that don’t – as well as the regular expression you think should match.

    For example, I might say, " I want this regular expression (?s)MATERIAL: ALLOY 625.*?DIAMETER: .840 to match

    MATERIAL: ALLOY 625
    this file should match
    DIAMETER: .840
    

    but not match

    MATERIAL: ALLOY 625
    this file should not match
    DIAMETER: 1840
    

    … but the regular expression matches both files. Why?"

    Then someone’s answer would be “because the period in the regular expression matches any character, not just the literal period character”

    -----
    Oh, I see that Scott’s post doesn’t include the alternate file-quoting method, which I used above. in a few minutes, I will add an example at the end of that [thread]https://notepad-plus-plus.org/community/topic/14262/how-to-markdown-code-on-this-forum/7)



  • I can have many entries. I entered what I could remember at home the other day. Reality is what I tried.
    Yes , several lines could be between. In fact, I may also need to add 2 more finds too. And yes, I am looking in an entire folder of files. Hope that helps. And putting the explanations of the answers here really helps. I am not a coder. Just trying to debug some failures from some program. Thanks ALL



  • Unfortunately, we cannot help you generalize any more until you give us more examples of mini-files that should or should not match, similar to what I did above. It doesn’t have to be real data, just make it up, like I did. But right now, we don’t know if you want it to match any files that have at least one “MATERIAL: anything” followed some time later by “DIAMETER: specific-number”, or followed by “DIAMETER: any-floating-point-number”.

    For example, which of these should match?

    MATERIAL: xyz
    blah
    DIAMETER: .840
    

    or

    MATERIAL: xyz
    blah
    DIAMETER: 0.840
    

    or

    MATERIAL: pdq
    blah
    DIAMETER: 1.840
    

    or

    MATERIAL: ALLOY 625
    blah
    DIAMETER: 2.50
    

    or

    MATERIAL: ALLOY UNNUMBERED
    blah
    DIAMETER: -.50
    
    MATERIAL: ALLOY UNNUMBERED
    blah
    DIAMETER: +.50
    

    or …

    What are the restrictions on the MATERIAL? What are the restrictions on the DIAMETER? Or are you just looking for any file that has a pair of MATERIAL: and DIAMETER:, and you don’t actually care what comes next on those lines.

    We cannot guess. We should not have to come up with these examples to ask you. If you want help, you have to provide the information to help.

    I will not be able to provide any more help or suggestions until you’ve given us more rules (detailed description, not handwavy), and preferrably giving us example files that should and should not match. Please format any future responses using the markup learned from the thread I linked earlier, especially using one of the two example methods (indenting or the ```z notation) for making sure your files come through exactly as pasted. Until you have shown that effort, there is no more I can do, sorry.



  • @PeterJones I will try to enter more REAL examples, but I fear that might not be realistic. e.g. Material might be

    • MATERIAL: Copper
    • MATERIAL: NICKEL COPPER
    • MATERIAL: AL
      And what I don’t know for sure are lines between, if that matters, or spaces before. But I am going to try to give a real life example tomorrow morning. Thank you for all of your help


  • @Joel-Lindenberg,

    Hopefully, after the example data, we’ll be able to get you a working solution.



  • Thanks for your diligence and Patience.
    Here are three snippets.
    =SAMPLE 1====================================
    BLAH:
    BLAH:
    BLAH:
    DATE: FEBRUARY 30, 2002
    DWG REV: -NA-
    CRO REV: -NA-
    DAT REV: -NA-
    SHT REV: -NA-
    DETAIL: BLAH
    DYPN NUMBER: BLAH
    MATERIAL: ALLOY 625
    PIPE LENGTH: BLAH
    DIAMETER: 2.375
    THICKNESS: .109
    BEND RAD: 6.000
    BLAH:
    BLAH:
    =SAMPLE 2====================================
    BLAH:
    BLAH:
    BLAH:
    DATE: FEBRUARY 31, 2002
    DWG REV: -NA-
    CRO REV: -NA-
    DAT REV: -NA-
    SHT REV: -NA-
    DETAIL: BLAH
    DYPN NUMBER: BLAH
    MATERIAL: NICKEL-COPPER
    PIPE LENGTH: BLAH
    DIAMETER: .840
    THICKNESS: .109
    BEND RAD: 2.500
    BLAH:
    BLAH:
    =SAMPLE 3====================================
    BLAH:
    BLAH:
    BLAH:
    DATE: FEBRUARY 32, 2002
    DWG REV: -NA-
    CRO REV: -NA-
    DAT REV: -NA-
    SHT REV: -NA-
    DETAIL: BLAH
    DYPN NUMBER: BLAH
    MATERIAL: CUNI 70:30
    PIPE LENGTH: BLAH
    DIAMETER: .840
    THICKNESS: .120
    BEND RAD: 2.500
    BLAH:
    BLAH:



  • @Joel-Lindenberg ,

    Thanks for the data.

    Using that data, the regex (?s)MATERIAL:\h*\w+.*?DIAMETER:\h*[\.\d]+ finds three matches (I assume you intended for all three samples to match).

    If you want details on that regex, see https://regexr.com/468dg, which I saved with that data and regex.

    As a reminder, this FAQ will direct you to plenty of regex documentation for future search/replace/mark needs.



  • take that file to excel
    separate line at “:” so that variable name and value come in two columns
    then sort that in excel so that all same variable name come together.

    then do your thing.
    combing two column to get a single line (reverse of what you did above)
    bring file back to npp.

    If you want original order, after taking the file to excel you can put a 3rd column and fill it will 1,2,3 serial no. for you entire data.
    after first sorting, it will come to variable name order
    so after finishing your data manipulation, sort again on this serial no. column and you get data in your original order.

    thanks.


Log in to reply