Search and remove items within tags.

John Thompson 0

Every week I publish a page that is growing longer and longer as popularity grows.

I take a webpage, put it into notepad, and clean it up. Mostly I dump some columns within a table, then tighten it up from there to make a cleaned out short version.

The question is this, right now I do a search for [img] which is bb code indicating and image. Of course the whole line is

[img]path-to-photo.jpg[/img]

Right now I just search the doc for all [img] tags. then go one by one deleting the image calls through the whole page. Normally there are 75-125 hits per task.

it gets to be very tiresome doing this all manually. what I’m hoping to find is a way to have notepad++ delete everything from
[img] to [/img] But of course a regular search doesn’t work because each path is going to be different.

is there a plugin or a way I could search dynamically and have notepad just delete anything that has the [img] tag?

Sorry to ask such a simple question but it sure would save me some time!

Thanks~

Terry R

@John-Thompson-0 said in Search and remove items within tags.:

is there a plugin or a way I could search dynamically and have notepad just delete anything that has the [img] tag?

Sure is. It’s called a regular expression or regex. It uses codes to identify characters in several ways.

So you haven’t stated if ALL the [img]...[/img] tags will ALWAYS be on 1 line but a possible idea would be to look for a range of characters following the [img] up until (and including) the following [/img] tag.

So using the “Replace” function we have:
Find What:(?-s)\\[img\\].+?\\[/img\\]
Replace With: empty field here
This is a regex so the search mode must be regular expression. You can have the wrap around button ticked, or make sure the cursor is in the very first position of the file before starting.

With this you can use the “Find” button initially and it will locate (and highlight) the first occurrence. At this point if you want to delete it then click the “Replace” button. As the replacement field is empty this effectively means delete the highlighted text. The next occurrence of the tag will then be highlighted ready for you to either press Find or Replace. You would use Find button it NOT wanting to delete the highlighted occurrence.

So as a suggestion, use it with the Find/Replace buttons for a one time process. If you are happy that it correctly highlights every occurrence the next time you process a file you could use the “Replace All” button. This finds and Replaces (so deletes) all occurrences with the one click.

As stated above this will ONLY find those occurrences within 1 line. If you have occurrences that occur over 2 (or more) lines then a change will be required. It’s as simple as changing the (?-s) to (?s).

Come back with your results. We may need to alter it if you find some are missed, or even some other text is highlighted when it shouldn’t be.

We can also give you a bit of background into the codes used in this regex.

Terry

PS had to edit as forgot the markdown engine driving these posts ate some of my \ characters.

John Thompson 0

Beautiful. this is going to save a huge bit of time each week!

Sometimes the img tag goes on for two lines so I will need to use (?s) for sure.

I will be running this shortly and will report my findings. Thank you SO much!

John Thompson 0

@Terry-R

I can tell you for sure that my first trial it worked great. I put 6 different img tags in the doc and ran the search/replace and it worked perfectly. I think this is going to do it.

You have no idea how helpful this is going to be for me.

Thank you!

Alan Kilborn

@John-Thompson-0

So what you’ve been given to solve your problem is one more most basic and core substitutions of this type that you can do.
You can see the power of it now, I’m sure.
Do yourself a favor and acquaint yourself with other similar techniques buy having a read HERE.

Alan Kilborn

@Alan-Kilborn said:

…buy having a read HERE…

Contrary to what that says, nothing to “buy”, it’s all FREE. :-)

(Correction: … by having a read…)

Terry R

@John-Thompson-0 said in Search and remove items within tags.:

I take a webpage, put it into notepad, and clean it up. Mostly I dump some columns within a table, then tighten it up from there to make a cleaned out short version.

As @Alan-Kilborn said, look at the regex documentation and try and start the learning process. Since your above statement suggests you have other editing to do, maybe your “new found” knowledge could be put to good use in doing more of the manual edits you perform.

Regex is awesome in performing lots of editing, so long as the edit can be explained logically. As examples
"I have this 3 character code and I need to delete it and the following text until I reach an “end of line”
“I have this number, it can be 7-10 characters long and I need it formatted with the first 3, followed by a “-” and then the rest of the number”

It could possibly also do your column editing for you Very likely I’d say). You do need to be prepared to spend a bit of time learning. Attempt to see if you can get a working regex for the column editing. We are here to assist but do like to see some ideas that you have tried. If needing to present examples do so in the same manner as your first post (inside the black box) as that prevents the posting engine from potentially mangling the data.

And as a background to my supplied regex the description is as follows:
(?-s) - as you found out, this refers to a single line. Actually it means the. (dot) character will not include the end of line (EOL) markers (carriage return and line feeds). The (?s) means the . character will include EOL markers.
\[img\] - note here that I included the \ character as the [ and ] are special. The \ tells the regex engine that it’s the actual character I want not the special meaning.
.+? - this is where the real fun begins. The . means a single character position. The + means “greedy” so as many as allowed. The “?” turns the greedy into “lazy” so as less as possible.
\[/img\] - this is again looking for the actual text [/img].

So for the regex to succeed it must complete the entire “formula”. This forces the lazy portion .+? to continue adding characters one by one until it finds the following portion, the [/img].

Good luck
Terry

PeterJones

@Terry-R ,

Correcting a mistake caused by the forum:

the “img” tags regex pieces should really be like was shown earlier:

(“wonderful” square-bracket escape “feature” in forum.)

Terry R

@PeterJones said in Search and remove items within tags.:

Correcting a mistake caused by the forum:

Thanks @PeterJones . I was nearly caught out once (and edited that post). Stupid me forgot a second time, what made it worse was my descriptor with the example mentioned the \ so that should have alerted me.

That nasty markdown engine. Why can’t it leave well alone!

Cheers
Terry

John Thompson 0

OK THanks everyone. I’m working right now so I don’t have time to reply to the new info yet, but I am going to read up.

I’ll respond to the others this evening but wanted to respond to @Terry-R first. I used this today on a full length list and it worked like a charm out of the box.

I’ll be running it again Saturday morning so I’ll follow up there once I’ve had a chance to read the other’s responses.

@Terry-R said in Search and remove items within tags.:

@John-Thompson-0 said in Search and remove items within tags.:

is there a plugin or a way I could search dynamically and have notepad just delete anything that has the [img] tag?

Sure is. It’s called a regular expression or regex. It uses codes to identify characters in several ways.

So you haven’t stated if ALL the [img]...[/img] tags will ALWAYS be on 1 line but a possible idea would be to look for a range of characters following the [img] up until (and including) the following [/img] tag.

So using the “Replace” function we have:
Find What:(?-s)\\[img\\].+?\\[/img\\]
Replace With: empty field here
This is a regex so the search mode must be regular expression. You can have the wrap around button ticked, or make sure the cursor is in the very first position of the file before starting.

With this you can use the “Find” button initially and it will locate (and highlight) the first occurrence. At this point if you want to delete it then click the “Replace” button. As the replacement field is empty this effectively means delete the highlighted text. The next occurrence of the tag will then be highlighted ready for you to either press Find or Replace. You would use Find button it NOT wanting to delete the highlighted occurrence.

So as a suggestion, use it with the Find/Replace buttons for a one time process. If you are happy that it correctly highlights every occurrence the next time you process a file you could use the “Replace All” button. This finds and Replaces (so deletes) all occurrences with the one click.

As stated above this will ONLY find those occurrences within 1 line. If you have occurrences that occur over 2 (or more) lines then a change will be required. It’s as simple as changing the (?-s) to (?s).

Come back with your results. We may need to alter it if you find some are missed, or even some other text is highlighted when it shouldn’t be.

We can also give you a bit of background into the codes used in this regex.

Terry

PS had to edit as forgot the markdown engine driving these posts ate some of my \ characters.

John Thompson 0

This post is deleted!

John Thompson 0

Just wanted to let you know, I haven’t forgotten this post and it’s really become helpful. I know it stays in my cache of searches but I saved it in a snippets file I use anyway.

(?-s)\[img\].+?\[/img\]

As for explaining it , if I look long enough I get it. I always get confused by having to use escape characters for things and that’s mostly what gets confusing here. but the beginning is just calling perl to look for a pattern correct and inside the parenthesis (is that what they’re called?) are defining what to look for, using the escape characters so it allows me to use characters that might otherwise be part of the language itself?

I know I said most of that wrong, but I do get it. It’s just that I can figure out what some of that stuff means, but writing it is a whole nother story.

@Terry-R said in Search and remove items within tags.:

@John-Thompson-0 said in Search and remove items within tags.:

is there a plugin or a way I could search dynamically and have notepad just delete anything that has the [img] tag?

Sure is. It’s called a regular expression or regex. It uses codes to identify characters in several ways.

So you haven’t stated if ALL the [img]...[/img] tags will ALWAYS be on 1 line but a possible idea would be to look for a range of characters following the [img] up until (and including) the following [/img] tag.

So using the “Replace” function we have:
Find What:(?-s)\\[img\\].+?\\[/img\\]
Replace With: empty field here
This is a regex so the search mode must be regular expression. You can have the wrap around button ticked, or make sure the cursor is in the very first position of the file before starting.

With this you can use the “Find” button initially and it will locate (and highlight) the first occurrence. At this point if you want to delete it then click the “Replace” button. As the replacement field is empty this effectively means delete the highlighted text. The next occurrence of the tag will then be highlighted ready for you to either press Find or Replace. You would use Find button it NOT wanting to delete the highlighted occurrence.

So as a suggestion, use it with the Find/Replace buttons for a one time process. If you are happy that it correctly highlights every occurrence the next time you process a file you could use the “Replace All” button. This finds and Replaces (so deletes) all occurrences with the one click.

As stated above this will ONLY find those occurrences within 1 line. If you have occurrences that occur over 2 (or more) lines then a change will be required. It’s as simple as changing the (?-s) to (?s).

Come back with your results. We may need to alter it if you find some are missed, or even some other text is highlighted when it shouldn’t be.

We can also give you a bit of background into the codes used in this regex.

Terry

PS had to edit as forgot the markdown engine driving these posts ate some of my \ characters.

John Thompson 0

@Alan-Kilborn said in Search and remove items within tags.:

@John-Thompson-0

So what you’ve been given to solve your problem is one more most basic and core substitutions of this type that you can do.
You can see the power of it now, I’m sure.
Do yourself a favor and acquaint yourself with other similar techniques buy having a read HERE.

I do understand your point. My old business partner and best friend for 20+ years was a programmer and I just could never get my head around it. Like, I can decipher a lot of what’s in the line but to sit down and write it myself? no way. But I suppose if I just started doing it I could. I mean isn’t that how most people learned html and css? I know it’s how I did. back 25 years ago.

It’s been hard to get it. I’ve tried taking jquery courses and basic JavaScripting and it just never ‘took hold’ and got me interested. I suppose it’s because I never really ‘had’ to learn that like I did markup etc.