How to find specific string (in my case its a link) that end with quotation mark.
I have a wall of text like this.
I wanna mark all the twitch links with a channel names and copy them , thats a basic idea.
The only solution i have, is to search for this
After that (is where i stuck) it should mark up until the quotation mark
I am sure there is a way to achive this using regex. Please help me guys.
The following regex will capture (highlight) the remainder of the text you seek. I’m not sure how it will help you though as you suggest you need to further manipulate the text which is highlighted.
So on the Mark function use:
As I was not able to determine the exact quote character you use this may need adjusting. If my regex does NOT get the right text, copy the closing quote character from your file and replace mine (in the regex). Quotes can be problematic as there isn’t just 1 kind.
Let me know how you get on and especially if you require further help.
Just so you know, how the regex works, look for the first bit (which you already had), followed by ANY character as long as it isn’t a quote, for as many characters as can be found. Therefore it will stop just short of the quote character.
Works perfectly , thank you.
Now i need to copy all this text.(370 matches)
Is there a way to do this, i think that shouldn’t be too difficult.
That was what I was referring to. Marking just shows you the occurances. It doesn’t help with additional processing.
What I think you really want to do is to use a copy of the file which you will selectively destroy some of text in, leaving only the references you want.
Here’s what I’d do. Make a copy of the file (into another tab of Notepad++).
Use the following regex to remove all unwanted text. This is only a rough job. You may still need to remove lines where the text does NOT occur.
So ANY line with the text you want will ONLY have that text remaining on the line. All other lines without it will be unchanged. I’d then use a line sort function (Edit, Line Operations, Sort Lines Lexicographically Ascending). This will put all the lines you want to keep together. Remove all others. I could spend more time on a regex that would do ALL this but this is quick and easy to do.
See how you go with this.
I probably should ask the question, does any occurance of the text you want crossover lines. I see one of the highlighted instances was right at the end of a line. Currently I haven’t catered for that situation, so check (manually if need) for any other twitch.tv occurrences that did NOT get marked.
In my file i have only 10 lines, like that
Maybe bacause of that i cant achive what i want.
After Find and replace
it deletes almost everything i need and leaves me with 8 matches of twitch.
Here is a txt file that i am using , maybe this way it will be easier to figure out.
Your file was a great help. Definitely needed to see that as it showed me that the lines were very long, with multiple occurences on each line of the text you want.
So here is my revised set of steps.
- Make a copy in another tab of Notepad++
- Use the Replace Function to remove all line endings (carriage return line feeds).
empty field<—nothing in this field
Now everything is on 1 line (it may not look that way if you have word wrap turned on)
- Use Replace function to remove ALL unwanted text.
This will remove all unwanted text except for the last occurance.
- Now to put all occurances of the text we want on different lines.
Once this step is completed, go to the last line and remove the extra text behind the portion you want to keep.
Again this is a quick process, I haven’t spent much time on making it do everything. Sometimes quick and easy steps are better than trying to cover ALL bases and using a long winded approach.
Have a go and let us know.
I have had a slightly longer look at the file you provided. I note that you mentioned about quotes, however your initial regex did NOT include those. In the file it would appear there are some instances of twitch.tv without quotes. I’m not sure you actually intend to capture those as well.
I’ve made a revised regex which doesn’t need so many steps, however it will still require the final file to be edited a bit. Once you try it you will see what I mean. Some of the lines stick out very easily as not being correct.
So no need to remove carriage returns, but you will need to remove those lines that DON’T start with “https”. This can be done with Mark, also ticking bookmark, which can then be used to remove lines bookmarked.
This is amazing, thank you very much , works beautifully.
guy038 last edited by guy038
Hello @fujosej Fujo, @rerry-r and All,
Thanks, @fujosej-fujo, for your
new 6.txttext file. It’s always better to work on “real” data ;-))
I think, @terry-r, that all work can be reduced to an unique, regex S/R, only ;-))
So, @fujosej-fujo, basically, you’re searching for any area of text :
Ending at the first next quote char
This regex, which searches for such an area, is :
(?s-i)modifiers, at beginning, means that :
Any meta-character dot (
.) represents, absolutely, any single character, even EOL ones (
The regex engine will search in a non-insensitive way (
Then, it searches the literal string
Finally the part
.+?"finds the shortest area of any character, till the first next quote char
Now that we built this first regex to match the zones to extract, we create a second regex which contains this first regex, using the syntax, where your regex is surrounded with parentheses, in order to store its value as group
1, for future replacement :
Thus, this leads to the correct regex S/R, below :
=> From your
366replacements occur and you get a neat list of
After the modifiers, the part
.+?matches the shortest part from, either, the beginning of file or the end of the previous match, until the expression
In replacement, we rewrite, only, the expected group
1, which must be extracted, followed with a line-break
Near the end of the file, when no more
"https://www.twitch.tvcan be found, the regex engine uses the second alternative
.+, after the alternation symbol
|, which will grab all text till the very end of the file, as the
(?s)modifier is always active !
This time, as group
1is not defined, the replacement simply delete this last non-wanted part
Refer also to this more complete post, on that topic ( How to extract all the results matched ) :
You’re mostly correct, except that I believe the OP didn’t want the quote characters included, they were just to delimit the text he DID want. Thus your regex should be
the Replace With field is as stated.
Once again , thank you guys! Works like a charm.
I have one more question , sorry)
so lets say i have this list
i wanna add ‘s’ to http and also a ‘www.’ before twitch.
Is there a way to do this with regex?
this is actually very easy. You only need to search for
http://and have the replacement as
You could even do this with the Replace function set to “normal” mode as there aren’t any special characters as used previously (.+?) etc.
As I said this can be either as normal mode or regular expression mode, it won’t matter.
Thank you very much.That was easier than i thought.