Erase what is not

Raffaele Pretolani

139/5000
Hi,
I have a .txt file where I want to delete everything that is not:

\"Funzione:.*?"

How can I eliminate what is not this?

Thanks!

PeterJones

@Raffaele-Pretolani said in Erase what is not:

\"Funzione:.*?"

Hard to tell which of those you want literal, and which of those you want as regex wildcards, so I’m not even going to guess (turning over a new leaf for me).

As a general idiom, to delete any lines that don’t contain the sub-sequence XXXXX, and to delete anything except the match from a line that includes that sub-sequence, not leaving blank lines behind,

FIND = (?-s)(^.*(XXXXX).*?(\R|\Z))|(^.*(\R|\Z))
REPLACE = $2$3
MODE = regular expression

original

not here
XXXXX
nothing
before XXXXX
nope
XXXXX after
exclude this as well
before XXXXX after
none
before XXXXX after last line no EOL

will become

XXXXX
XXXXX
XXXXX
XXXXX
XXXXX

In general, you would then change XXXXX in your regex with the actual literal text or regex sub-expression that matches what you want. However, sometimes there is debug that needs to be done because of interactions between your sub-expression and the wrapper regex.

Of course, I doubt my answer will really answer your question sufficiently, because “delete everything that is not” is actually more ambiguous than you seem to think, and there are multiple ways to interpret it.

If you’re going to reply again to try to refine the regex, I highly recommend reading the advice in my boilerplate which I’m about to quote, and show a willingness to help us help you, and show evidence in your reply that you have taken the advice to heart. Good luck.

-----

Please Read And Understand This

FYI: I often add this to my response in regex threads, unless I am sure the original poster has seen it before. Here is some helpful information for finding out more about regular expressions, and for formatting posts in this forum (especially quoting data) so that we can fully understand what you’re trying to ask:

This forum is formatted using Markdown. Fortunately, it has a formatting toolbar above the edit window, and a preview window to the right; make use of those. The </> button formats text as “code”, so that the text you format with that button will come through literally; use that formatting for example text that you want to make sure comes through literally, no matter what characters you use in the text (otherwise, the forum might interpret your example text as Markdown, with unexpected-for-you results, giving us a bad indication of what your data really is). Images can be pasted directly into your post, or you can hit the image button. (For more about how to manually use Markdown in this forum, please see @Scott-Sumner’s post in the “how to markdown code on this forum” topic, and my updates near the end.) Please use the preview window on the right to confirm that your text looks right before hitting SUBMIT. If you want to clearly communicate your text data to us, you need to properly format it.

If you have further search-and-replace (“matching”, “marking”, “bookmarking”, regular expression, “regex”) needs, study the official Notepad++ searching using regular-expressions docs, as well as this forum’s FAQ and the documentation it points to. Before asking a new regex question, understand that for future requests, many of us will expect you to show what data you have (exactly), what data you want (exactly), what regex you already tried (to show that you’re showing effort), why you thought that regex would work (to prove it wasn’t just something randomly typed), and what data you’re getting with an explanation of why that result is wrong. When you show that effort, you’ll see us bend over backward to get things working for you. If you need help formatting, see the paragraph above.

Please note that for all regex and related queries, it is best if you are explicit about what needs to match, and what shouldn’t match, and have multiple examples of both in your example dataset. Often, what shouldn’t match helps define the regular expression as much or more than what should match.

Here is the way I usually break down trying to figure out a regex (whether it’s for myself or for helping someone in the forum):

Compare what portions of each line I want to match is identical to every other one (“constants”), and what parts do I want to allow to be different in each line (“variables”) but still be part of the match.

Look at both the variables and constants, and see what portions of each I’ll want to keep or move around, vs which parts get thrown away completely. Each sub-component that I want to keep will be put in a regex group. Anything that gets completely thrown away doesn’t need to be in a group, though sometimes I put it in a numbered (___) or unnumbered (?:___) group anyway, if I have a good reason for it. Anything that needs to be split apart, I break into multiple groups, instead of having it as one group.

For each group, I do a mental “how would I describe to my son how to correctly match these characters?” – which should hopefully give me a simple, foolproof algorithm of characters that must match or must not match; then I ask, “how would I translate those instructions into regex sequences?” If I don’t know the answer to the second, I read documentation, or ask a specific question.

try it, debug, iterate.

guy038

Hi, @raffaele-pretolani, @peterjones and All,

Peter, a possible variant of your general regex would be :

SEARCH (?-s)^(?=.*(XXXXX)).*|^.*\R?

REPLACE ?1\1

Notes :

We don’t even need to store \R as it’s not required in replacement
If the string XXXXX is not part of current line, all contents, even empty, of that line, with its line-break, when present, is simply deleted ( 2nd alternative of the search regex )

Best Regards,

guy038

Raffaele Pretolani

It doesn’t work, unfortunately.
I try to explain better.
I have a single line which contains more interested parties.
I take a small part to give an example:

"Funzione: GESAUS" href="?url=%2Ftrans%2Fx3%2Ferp%2FFOCACCIA%2F%24sessions%3Ff%3DGESAUS%252F2%252F%252FM%252F">Utenti</a></div><a href="#" title="Aggiungi a prefer." class="s-nav-menu-bookmark s-fonticon-btn s-btn-bookmark_off" style=""></a></div><div class="s-nav-items-item s-nav-menu-item"><div class="s-nav-items-item-value"><a class="s-nav-menu-link" title="Funzione: GESAPN" href="?url=%2Ftrans%2Fx3%2Ferp%2FFOCACCIA%2F%24sessions%3Ff%3DGESAPN%252F2%252F%252FM%252F">Profilo menu</a></div><a href="#" title="Aggiungi a prefer." class="s-nav-menu-bookmark s-fonticon-btn s-btn-bookmark_off" style=""></a></div><div class="s-nav-items-item s-nav-menu-item"><div class="s-nav-items-item-value" title=""><a class="s-nav-menu-link" title="Funzione: GESACS" href="?url=%2Ftrans%2Fx3%2Ferp%2FFOCACCIA%2F%24sessions%3Ff%3DGESACS%252F2%252F%252FM%252F">Codici d'accesso</a></div><a href="#" title="Aggiungi a prefer." class="s-nav-menu-bookmark s-fonticon-btn s-btn-bookmark_off" style=""></a></div></div><div class="s-nav-items-col"><div class="s-nav-items-item s-nav-menu-item"><div class="s-nav-items-item-value"><a class="s-nav-menu-link" title="Funzione: GESAFT"

From this text I must erase all that is not
"Funzione: XXXXXX" (where XXXXXX changes every time)

How?

To find them I use the command:

\"funzione:.*?"

Am I more clear this way?

Thanks again

Alan Kilborn

@Raffaele-Pretolani said in Erase what is not:

It doesn’t work, unfortunately.

What did you try, exactly?

single line which contains more interested parties

I eagerly anticipate finding out what this really means!

Am I more clear this way?

Not really.

Did you try?:

SEARCH (?-s)^(?=.*("Funzione: .{6}")).*|^.*\R?

REPLACE ?1\1

Raffaele Pretolani

@Alan-Kilborn said in Erase what is not:

What did you try, exactly?

The proposed solutions

I eagerly anticipate finding out what this really means!

That the text is not divided into multiple lines but is a single one.

Did you try?:

SEARCH (?-s)^(?=.*("Funzione: .{6}")).*|^.*\R?

REPLACE ?1\1

Using this formula selects me everything, even the one I don’t want to eliminate.

Alan Kilborn

@Raffaele-Pretolani said in Erase what is not:

(I tried) The proposed solutions

Sorry, friend, that type of response isn’t going to elicit more help for you. You need to explain what you tried, and how it didn’t work for you, etc.

I think it is becoming clearer, but helpers here aren’t inclined to “pull your problem out of you” before then solving it.

That the text is not divided into multiple lines but is a single one.

You might have revealed that fact right when @PeterJones started talking about “lines” in his response. Otherwise readers and responders just assume that it is correct for your situation.

Regardless, cheers and good luck.

Raffaele Pretolani

This is my file from which I have to delete everything except the highlighted parts.
As you can see, all the text is in a single line (line 1).
To select the elements I used the "funzione:.*?" formula.

I can’t find a way to delete everything except the highlighted one.
With the formulas mentioned above by users, in any case, the “Search” finds me as a result, everything and not everything - "funzione:.*?".

PeterJones

Thanks for giving us more to go on…

I can come “pretty close” to what you describe.

Assuming you have data like

blah "Funzione: ABCD" blah blah "Funzione: 1234" blah blah "Funzione: r2d2" blah blah "Funzione: !123!" blah

Then the regex

FIND = .*?("Funzione:.*?")
REPLACE = $1\r\n
mode = regex

will result in

"Funzione: ABCD"
"Funzione: 1234"
"Funzione: r2d2"
"Funzione: !123!"
 blah

If there was anything after the last "Funzione:...", it will show up on the last line of the replacement, but there will be only one of those, so you can just manually delete that last line a single time. (It would be possible, but more complicated, to do it all in one regex. I often prefer doing regex operations in 2-3 steps, rather than spending 2x-3x as long trying to debug a single regex that will do it all.)

If you want spaces rather than newlines separating the matches, just use a space (or \x20) instead of the \r\n in the REPLACE. If you want nothing between (so the quotes will jam up next to each other), then just REPLACE = $1.

Raffaele Pretolani

@PeterJones

PERFECT!
That’s exactly what I was looking for.
Thank you very much, and sorry if I wasn’t clear.

THANKS!