Search and remove items within tags.



  • Every week I publish a page that is growing longer and longer as popularity grows.

    I take a webpage, put it into notepad, and clean it up. Mostly I dump some columns within a table, then tighten it up from there to make a cleaned out short version.

    The question is this, right now I do a search for [img] which is bb code indicating and image. Of course the whole line is

    [img]path-to-photo.jpg[/img]
    

    Right now I just search the doc for all [img] tags. then go one by one deleting the image calls through the whole page. Normally there are 75-125 hits per task.

    it gets to be very tiresome doing this all manually. what I’m hoping to find is a way to have notepad++ delete everything from
    [img] to [/img] But of course a regular search doesn’t work because each path is going to be different.

    is there a plugin or a way I could search dynamically and have notepad just delete anything that has the [img] tag?

    Sorry to ask such a simple question but it sure would save me some time!

    Thanks~



  • @John-Thompson-0 said in Search and remove items within tags.:

    is there a plugin or a way I could search dynamically and have notepad just delete anything that has the [img] tag?

    Sure is. It’s called a regular expression or regex. It uses codes to identify characters in several ways.

    So you haven’t stated if ALL the [img]...[/img] tags will ALWAYS be on 1 line but a possible idea would be to look for a range of characters following the [img] up until (and including) the following [/img] tag.

    So using the “Replace” function we have:
    Find What:(?-s)\[img\].+?\[/img\]
    Replace With: empty field here
    This is a regex so the search mode must be regular expression. You can have the wrap around button ticked, or make sure the cursor is in the very first position of the file before starting.

    With this you can use the “Find” button initially and it will locate (and highlight) the first occurrence. At this point if you want to delete it then click the “Replace” button. As the replacement field is empty this effectively means delete the highlighted text. The next occurrence of the tag will then be highlighted ready for you to either press Find or Replace. You would use Find button it NOT wanting to delete the highlighted occurrence.

    So as a suggestion, use it with the Find/Replace buttons for a one time process. If you are happy that it correctly highlights every occurrence the next time you process a file you could use the “Replace All” button. This finds and Replaces (so deletes) all occurrences with the one click.

    As stated above this will ONLY find those occurrences within 1 line. If you have occurrences that occur over 2 (or more) lines then a change will be required. It’s as simple as changing the (?-s) to (?s).

    Come back with your results. We may need to alter it if you find some are missed, or even some other text is highlighted when it shouldn’t be.

    We can also give you a bit of background into the codes used in this regex.

    Terry

    PS had to edit as forgot the markdown engine driving these posts ate some of my \ characters.



  • Beautiful. this is going to save a huge bit of time each week!

    Sometimes the img tag goes on for two lines so I will need to use (?s) for sure.

    I will be running this shortly and will report my findings. Thank you SO much!



  • @Terry-R

    I can tell you for sure that my first trial it worked great. I put 6 different img tags in the doc and ran the search/replace and it worked perfectly. I think this is going to do it.

    You have no idea how helpful this is going to be for me.

    Thank you!



  • @John-Thompson-0

    So what you’ve been given to solve your problem is one more most basic and core substitutions of this type that you can do.
    You can see the power of it now, I’m sure.
    Do yourself a favor and acquaint yourself with other similar techniques buy having a read HERE.



  • @Alan-Kilborn said:

    …buy having a read HERE…

    Contrary to what that says, nothing to “buy”, it’s all FREE. :-)

    (Correction: … by having a read…)



  • @John-Thompson-0 said in Search and remove items within tags.:

    I take a webpage, put it into notepad, and clean it up. Mostly I dump some columns within a table, then tighten it up from there to make a cleaned out short version.

    As @Alan-Kilborn said, look at the regex documentation and try and start the learning process. Since your above statement suggests you have other editing to do, maybe your “new found” knowledge could be put to good use in doing more of the manual edits you perform.

    Regex is awesome in performing lots of editing, so long as the edit can be explained logically. As examples
    "I have this 3 character code and I need to delete it and the following text until I reach an “end of line”
    “I have this number, it can be 7-10 characters long and I need it formatted with the first 3, followed by a “-” and then the rest of the number”

    It could possibly also do your column editing for you Very likely I’d say). You do need to be prepared to spend a bit of time learning. Attempt to see if you can get a working regex for the column editing. We are here to assist but do like to see some ideas that you have tried. If needing to present examples do so in the same manner as your first post (inside the black box) as that prevents the posting engine from potentially mangling the data.

    And as a background to my supplied regex the description is as follows:
    (?-s) - as you found out, this refers to a single line. Actually it means the. (dot) character will not include the end of line (EOL) markers (carriage return and line feeds). The (?s) means the . character will include EOL markers.
    [img] - note here that I included the \ character as the [ and ] are special. The \ tells the regex engine that it’s the actual character I want not the special meaning.
    .+? - this is where the real fun begins. The . means a single character position. The + means “greedy” so as many as allowed. The “?” turns the greedy into “lazy” so as less as possible.
    [/img] - this is again looking for the actual text [/img].

    So for the regex to succeed it must complete the entire “formula”. This forces the lazy portion .+? to continue adding characters one by one until it finds the following portion, the [/img].

    Good luck
    Terry



  • @Terry-R ,

    Correcting a mistake caused by the forum:
    059900b6-acc9-4493-8d8a-896f02672f01-image.png

    the “img” tags regex pieces should really be like was shown earlier:
    b8e98364-8eed-4536-92de-2360b0a332a0-image.png

    (“wonderful” square-bracket escape “feature” in forum.)



  • @PeterJones said in Search and remove items within tags.:

    Correcting a mistake caused by the forum:

    Thanks @PeterJones . I was nearly caught out once (and edited that post). Stupid me forgot a second time, what made it worse was my descriptor with the example mentioned the \ so that should have alerted me.

    That nasty markdown engine. Why can’t it leave well alone!

    Cheers
    Terry



  • OK THanks everyone. I’m working right now so I don’t have time to reply to the new info yet, but I am going to read up.

    I’ll respond to the others this evening but wanted to respond to @Terry-R first. I used this today on a full length list and it worked like a charm out of the box.

    I’ll be running it again Saturday morning so I’ll follow up there once I’ve had a chance to read the other’s responses.

    @Terry-R said in Search and remove items within tags.:

    @John-Thompson-0 said in Search and remove items within tags.:

    is there a plugin or a way I could search dynamically and have notepad just delete anything that has the [img] tag?

    Sure is. It’s called a regular expression or regex. It uses codes to identify characters in several ways.

    So you haven’t stated if ALL the [img]...[/img] tags will ALWAYS be on 1 line but a possible idea would be to look for a range of characters following the [img] up until (and including) the following [/img] tag.

    So using the “Replace” function we have:
    Find What:(?-s)\[img\].+?\[/img\]
    Replace With: empty field here
    This is a regex so the search mode must be regular expression. You can have the wrap around button ticked, or make sure the cursor is in the very first position of the file before starting.

    With this you can use the “Find” button initially and it will locate (and highlight) the first occurrence. At this point if you want to delete it then click the “Replace” button. As the replacement field is empty this effectively means delete the highlighted text. The next occurrence of the tag will then be highlighted ready for you to either press Find or Replace. You would use Find button it NOT wanting to delete the highlighted occurrence.

    So as a suggestion, use it with the Find/Replace buttons for a one time process. If you are happy that it correctly highlights every occurrence the next time you process a file you could use the “Replace All” button. This finds and Replaces (so deletes) all occurrences with the one click.

    As stated above this will ONLY find those occurrences within 1 line. If you have occurrences that occur over 2 (or more) lines then a change will be required. It’s as simple as changing the (?-s) to (?s).

    Come back with your results. We may need to alter it if you find some are missed, or even some other text is highlighted when it shouldn’t be.

    We can also give you a bit of background into the codes used in this regex.

    Terry

    PS had to edit as forgot the markdown engine driving these posts ate some of my \ characters.



  • This post is deleted!

Log in to reply