Find multiple lines in files
-
Hello, @joel-linderberg, @alan-kilborn, and All,
Let’s suppose that the two strings, which determine the area to match, are
AAAAA
andBBBBB
So, Joel, if you’re sure that string
AAAAA
comes always beforeBBBBB
, even several times, choose the Alan’s regex :(?s)AAAAA.+?BBBBB
which looks for the smallest range of any characters, even on several lines, between the strings AAAAA and BBBBB
-
The modifier
(?s)
just means that the dot character matches any single char ( Standard and EOL one ) -
The
+?
syntax represents the lazy quantifier, meaning as less as possible, from1
char toany
number of chars -
The
+
syntax, which represents the greedy quantifier, would have matched as much as possible, fromany
number of chars to1
char
Just a remark, Alan, the exact regex should be :
(?s)Material: Copper.*?Diameter: 2\.80
;-))
Now, if, on the contrary, the string
BBBBB
comes always beforeAAAAA
, choose, of course, the regex :(?s)BBBBB.+?AAAAA
If you’re not sure, a third possibility could use the following regex, which looks for the two cases, simultaneously :
(?s)(AAAAA).+?(BBBBB)|(?2).+?(?1)
-
The string AAAAA is stored as Group
1
-
The string BBBBB is stored as Group
2
-
The
(?1)
and(?2)
are routine calls to these regexes which, in our case, are, simply, literal strings -
The
|
character is the alternation symbol
Now, the drawback of that last regex is that, if you decide to start that search, on current file, from present cursor location, it may wrongly catch areas of text, as below :
.....AAAAA......AAAAA......BBBBB......
OR.....BBBBB......BBBBB......AAAAA......
To prevent that case, use this fourth regex :
(?s)(AAAAA)(?:(?!(?1)).)+?(BBBBB)|(?2)(?:(?!(?2)).)+?(?1)
However, beware that the negative look-head structures
(?!(?1))
and(?!(?2))
consumes system resources and sometimes end up to a catastrophic break-down of the N++ regex engine ! Especially, when you’re dealing with huge files and/or if a great amount of text lies between the two boundaries AAAAA and BBBBB :-((Note, also, that the
(?:.......)
syntax introduces a non-capturing group, generally followed with a quantifier symbolBest Regards,
guy038
-
-
Entered as such:
(?s)MATERIAL: ALLOY 625.*?DIAMETER: .840
And found nothing.
FIRST LINE I am looking for is MATERIAL: ALLOY 625 (Does spaces matter? Do number of spaces matter?)
SECOND LINE I am looking for is DIAMETER: .840 (Do number of spaces matter between DIAMETER and .840 matter? Is there certain syntax required if number is 2.50 or .840?)
Sequence of entries are always fixed. -
Yes, the number of spaces must match exactly the number in the text, or it will not match.
The forum may have eaten your
\
in your regex: the example you were copying from made sure to escape the decimal point, but I don’t see that backslash before the period in yours.I am pasting a boilerplate below: it will direct you to help on search/replace (regular expression aka regex) documentation, as well as pointing to a post which describes how to format text such that regular expressions or example text show up as you pasted them, rather than with the forum trying to interpret them as special markdown characters.
FYI: if you have further search-and-replace (regex) needs, study this FAQ and the documentation it points to. Before asking a new regex question, understand that for future requests, many of us will expect you to show what data you have (exactly), what data you want (exactly), what regex you already tried (to show that you’re showing effort), why you thought that regex would work (to prove it wasn’t just something randomly typed), and what data you’re getting with an explanation of why that result is wrong. When you show that effort, you’ll see us bend over backward to get things working for you. If you need help formatting the data so that the forum doesn’t mangle it (so that it shows “exactly”, as I said earlier), see this help-with-markdown post, where @Scott-Sumner gives a great summary of how to use Markdown for this forum’s needs.
Please note that for all “regex” queries – or queries where you want help “matching” or “marking” or “bookmarking” a certain pattern, which amounts to the same thing – it is best if you are explicit about what needs to match, and what shouldn’t match, and have multiple examples of both in your example dataset. Often, what shouldn’t match helps define the regular expression as much or more than what should match.Assuming you have the exact regular expression
(?s)MATERIAL: ALLOY 625.*?DIAMETER: .840
that I can see in your post, with a source file ofMATERIAL: ALLOY 625 you implied more lines can go here DIAMETER: .840
then the regular expression matches that entire first block of text.
If you also had
MATERIAL: ALLOY 625 you implied more lines can go here DIAMETER: .840 more separators MATERIAL: ALLOY 625 you implied more lines can go here DIAMETER: x840
then the regex as quoted would show two matches, rather than one, because
x
matches the regex special character.
. To limit it to just match the first group, the regex would need to be(?s)MATERIAL: ALLOY 625.*?DIAMETER: \.840
,\.
in a regular expression tells the engine to treat the.
a literal character, instead of a special-regex characterThis file:
MATERIAL: ALLOY 625 you implied more lines can go here DIAMETER: 2.50
will not match, because we’re looking for
\.850
in this variant that you showed. Changing the regex to(?s)MATERIAL: ALLOY 625.*?DIAMETER: 2\.50
will match, however.Since you said “find in files”, which implies multiple files, are you looking for files that all have exactly the same ALLOY number followed by exactly the same DIAMETER number? If so, why did you change the numbers on us in your second example vs your first. It’s highly confusing when you change expectations without explaining why.
It might help if you gave us more short example files (3-4lines like I used) – a couple that you think should match, and a couple that don’t – as well as the regular expression you think should match.
For example, I might say, " I want this regular expression
(?s)MATERIAL: ALLOY 625.*?DIAMETER: .840
to matchMATERIAL: ALLOY 625 this file should match DIAMETER: .840
but not match
MATERIAL: ALLOY 625 this file should not match DIAMETER: 1840
… but the regular expression matches both files. Why?"
Then someone’s answer would be “because the period in the regular expression matches any character, not just the literal period character”
-----
Oh, I see that Scott’s post doesn’t include the alternate file-quoting method, which I used above. in a few minutes, I will add an example at the end of that [thread]https://notepad-plus-plus.org/community/topic/14262/how-to-markdown-code-on-this-forum/7) -
I can have many entries. I entered what I could remember at home the other day. Reality is what I tried.
Yes , several lines could be between. In fact, I may also need to add 2 more finds too. And yes, I am looking in an entire folder of files. Hope that helps. And putting the explanations of the answers here really helps. I am not a coder. Just trying to debug some failures from some program. Thanks ALL -
Unfortunately, we cannot help you generalize any more until you give us more examples of mini-files that should or should not match, similar to what I did above. It doesn’t have to be real data, just make it up, like I did. But right now, we don’t know if you want it to match any files that have at least one “MATERIAL: anything” followed some time later by “DIAMETER: specific-number”, or followed by “DIAMETER: any-floating-point-number”.
For example, which of these should match?
MATERIAL: xyz blah DIAMETER: .840
or
MATERIAL: xyz blah DIAMETER: 0.840
or
MATERIAL: pdq blah DIAMETER: 1.840
or
MATERIAL: ALLOY 625 blah DIAMETER: 2.50
or
MATERIAL: ALLOY UNNUMBERED blah DIAMETER: -.50
MATERIAL: ALLOY UNNUMBERED blah DIAMETER: +.50
or …
What are the restrictions on the MATERIAL? What are the restrictions on the DIAMETER? Or are you just looking for any file that has a pair of MATERIAL: and DIAMETER:, and you don’t actually care what comes next on those lines.
We cannot guess. We should not have to come up with these examples to ask you. If you want help, you have to provide the information to help.
I will not be able to provide any more help or suggestions until you’ve given us more rules (detailed description, not handwavy), and preferrably giving us example files that should and should not match. Please format any future responses using the markup learned from the thread I linked earlier, especially using one of the two example methods (indenting or the ```z notation) for making sure your files come through exactly as pasted. Until you have shown that effort, there is no more I can do, sorry.
-
@PeterJones I will try to enter more REAL examples, but I fear that might not be realistic. e.g. Material might be
- MATERIAL: Copper
- MATERIAL: NICKEL COPPER
- MATERIAL: AL
And what I don’t know for sure are lines between, if that matters, or spaces before. But I am going to try to give a real life example tomorrow morning. Thank you for all of your help
-
Hopefully, after the example data, we’ll be able to get you a working solution.
-
Thanks for your diligence and Patience.
Here are three snippets.
=SAMPLE 1====================================
BLAH:
BLAH:
BLAH:
DATE: FEBRUARY 30, 2002
DWG REV: -NA-
CRO REV: -NA-
DAT REV: -NA-
SHT REV: -NA-
DETAIL: BLAH
DYPN NUMBER: BLAH
MATERIAL: ALLOY 625
PIPE LENGTH: BLAH
DIAMETER: 2.375
THICKNESS: .109
BEND RAD: 6.000
BLAH:
BLAH:
=SAMPLE 2====================================
BLAH:
BLAH:
BLAH:
DATE: FEBRUARY 31, 2002
DWG REV: -NA-
CRO REV: -NA-
DAT REV: -NA-
SHT REV: -NA-
DETAIL: BLAH
DYPN NUMBER: BLAH
MATERIAL: NICKEL-COPPER
PIPE LENGTH: BLAH
DIAMETER: .840
THICKNESS: .109
BEND RAD: 2.500
BLAH:
BLAH:
=SAMPLE 3====================================
BLAH:
BLAH:
BLAH:
DATE: FEBRUARY 32, 2002
DWG REV: -NA-
CRO REV: -NA-
DAT REV: -NA-
SHT REV: -NA-
DETAIL: BLAH
DYPN NUMBER: BLAH
MATERIAL: CUNI 70:30
PIPE LENGTH: BLAH
DIAMETER: .840
THICKNESS: .120
BEND RAD: 2.500
BLAH:
BLAH: -
Thanks for the data.
Using that data, the regex
(?s)MATERIAL:\h*\w+.*?DIAMETER:\h*[\.\d]+
finds three matches (I assume you intended for all three samples to match).If you want details on that regex, see https://regexr.com/468dg, which I saved with that data and regex.
As a reminder, this FAQ will direct you to plenty of regex documentation for future search/replace/mark needs.
-
take that file to excel
separate line at “:” so that variable name and value come in two columns
then sort that in excel so that all same variable name come together.then do your thing.
combing two column to get a single line (reverse of what you did above)
bring file back to npp.If you want original order, after taking the file to excel you can put a 3rd column and fill it will 1,2,3 serial no. for you entire data.
after first sorting, it will come to variable name order
so after finishing your data manipulation, sort again on this serial no. column and you get data in your original order.thanks.