how to remove empty spaces from a particular tag (regular expression)

Reply to how to remove empty spaces from a particular tag (regular expression) on Sat, 29 Dec 2018 09:41:23 GMT

Neculai I. Fantanaru — Sat, 29 Dec 2018 09:41:23 GMT

@guy038 I just review this post, because I like it and remembered the same thing from the post today.

SEARCH (?s)(?:\G|

)(?:(?!<|>).)*?\K(?:(^\h+)|\h+$|(?<=>)\h+|\h+(?=

)|(\h{2,})(?=[^<\h]))

REPLACE (?1$0)(?2\x20)

What if, in case I have another tag, like

So, @Robin Cruise scenario become:

Laurie Strode comes to her final confrontation with Michael Myers, the masked figure who has haunted her since she narrowly escaped.

So, In this case, your regex does not remove empty spaces because of those

Reply to how to remove empty spaces from a particular tag (regular expression) on Mon, 03 Dec 2018 13:41:07 GMT

Neculai I. Fantanaru — Mon, 03 Dec 2018 13:41:07 GMT

@guy038 said:

(?1$0)(?2\x20)

also, it can be replace with: (?{2}$1 ) such as:

SEARCH: (?:(^\h+)|\h+$|(?<=>)\h+|\h+(?=
)|(\h{2,})(?=[^<\h]))

REPLACE BY: (?{2}$1 )

Reply to how to remove empty spaces from a particular tag (regular expression) on Sat, 10 Nov 2018 17:53:11 GMT

Robin Cruise — Sat, 10 Nov 2018 17:53:11 GMT

well done, thank you very much

Reply to how to remove empty spaces from a particular tag (regular expression) on Sun, 11 Nov 2018 12:34:08 GMT

guy038 — Sun, 11 Nov 2018 12:34:08 GMT

Hello, @Robin-cruise and All,

Ah Yes ! My regex wasn’t enough accurate ! And worse, my formulation of the general regex solution was not exact too :-((

So, the general regex solution, when you want to perform a Search/Replacement in a specific area, only, is :

SEARCH (?-s)(\G|BR)((?!ER).)*?\KSR OR (?s)(\G|BR)((?!ER).)*?\KSR

REPLACE RR

where :

BR ( Begining Regex ) is the regex which defines, either, the start of that specific area and the start for a possible Search Regex match

ER ( Excluded Regex ) is the regex which defines the characters and/or strings forbidden, from the Begining Regex position until a next Search Regex match. It, implicitly, defines a zone, where the Search Regex may occur

SR ( Search Regex ) is the regex which defines the expression to search, if , both, the Begining Regex and the Excluded Regex are TRUE

RR ( Replace Regex ) is the regex which defines the expression replacing the Search Regex

In your case, we must look for unnecessary blank characters, in a <..........> area, without any < nor > inside. Hence, the excluded chars are , simply, the two symbols < and >

Now, inside that area, possibly multi-lines, we’ll look for either:

Case A : All blank characters,at beginning of lines, in case of a correct area, split in several lines

Case B : All blank characters,at end of lines, in case of a correct area, split in several lines

Case C : All blank characters, right after the < symbol

Case D : All blank characters, right before the
ending tag

Case E : All ranges of, at least, two blank characters, not followed from, either, a blank char or a < symbol

Theses 5 cases correspond to the different alternatives of the SR search regex, *separated with the | symbol

So, we have :

BR =

ER = <|>

SR = (?:(^\h+)|\h+$|(?<=>)\h+|\h+(?=
)|(\h{2,})(?=[^<\h]))

RR = (?1$0)(?2\x20)

Remarks :

The assertion \G is considered as true ( current position of caret), during the first run of the regex. So, if you move the cursor inside the leading spaces of a line, in purpose, before running the reges S/R, it could wrongly match the remaining leading spaces, located after the cursor position

So, in order to avoid theses matches, I added, for case E, the restrictive condition (\h{2,})(?=[^<\h]), which must be true, after the blank matched range !

Regarding the replacement :

If case A occurs, we must keep the leading spaces, stored in group1 So , we rewrite the entire match (?1$0)

If cases B, C or D occurs, we need to delete all these blank chars => Nothing is rewritten

If case E occurs, we just replace all the blank chars matched, stored in group2 with a single space character => (?2\x20)

Finally, we get this new regex S/R :

SEARCH (?s)(?:\G|
)(?:(?!<|>).)*?\K(?:(^\h+)|\h+$|(?<=>)\h+|\h+(?=
)|(\h{2,})(?=[^<\h]))

REPLACE (?1$0)(?2\x20)

which should avoid the side-effects of my first attempt ;-))

Beware

If you decide, before performing this regex S/R, to move the cursor, on purpose, inside the <.........> area of a
tag, with a class different from oyric, it will, also, match all the additional blank characters of that <.........> zone. Can’t do anything about this !

Luckily, once the caret is located after that first zone <.........>, the behavior of the regex is, again, as expected :-))

Cheers,

guy038

PS :

I say, above, regarding the ER ( Excluded Regex ), that it, implicitly, defines a zone, where the Search Regex may occur

Just an example to explain this notion. Let’s consider these 3 simple regexes :

[^<>]+(?=\h) , which searches the greatest range of chars, different of < and >, if followed with a blank char

[^<]+(?=\h) , which searches the greatest range of chars, different of <, if followed with a blank char

[^>]+(?=\h) , which searches the greatest range of chars, different of >, if followed with a blank char

Here is, below, the results, with any range of chars, underlined with - and the blank char, underlined with ^

REGEX [^<>]+(?=\h) This is a test bla blahh a test - ------------------------^ --^ -^ -------------^ -^ -^ ------^ REGEX [^<]+(?=\h) This is a test bla blahh a test ----------------------------------------^ -----^ -----------------------------^ ----^ ----------------------^ REGEX [^>]+(?=\h) This is a test bla blahh a test --^ ------------------------^ -----^ -------------^ ----^ ------^

As we use the greedy quantifier +, it easy to visualize the complete zones where it is allowed to look for a blank character, underlined with ^, in each case ;-))

Reply to how to remove empty spaces from a particular tag (regular expression) on Sat, 10 Nov 2018 10:04:56 GMT

rinku singh — Sat, 10 Nov 2018 10:04:56 GMT

i’m thinking about swiss file knife plugins to build.
http://stahlworks.com/dev/swiss-file-knife.html

Reply to how to remove empty spaces from a particular tag (regular expression) on Sat, 10 Nov 2018 09:23:38 GMT

Robin Cruise — Sat, 10 Nov 2018 09:23:38 GMT

SEARCH (?s)(\G|
)((?!
).)*?\K((?<=>)\h+|\h+(?=<|\h))

REPLACE Leave EMPTY

by the way, there is a little problem in your regex, guy038. Now I discover that.

Seems that your regex selects all spaces outside the specified tag, and disturb
all my other lines.

See a print screen:

https://snag.gy/fRX1ZO.jpg

or check regex on this code on notepad: https://regex101.com/r/dDBYSk/1

See what happen after “Replace all”. You will see that all tags with spaces before and after are modify, not only that particular tag I want. (
)

Reply to how to remove empty spaces from a particular tag (regular expression) on Tue, 30 Oct 2018 14:49:39 GMT

Robin Cruise — Tue, 30 Oct 2018 14:49:39 GMT

GREAT ! thank you very much ;)

Reply to how to remove empty spaces from a particular tag (regular expression) on Tue, 30 Oct 2018 19:40:56 GMT

guy038 — Tue, 30 Oct 2018 19:40:56 GMT

Hi, @Robin-cruise and All,

You were not very far from the right solution ! The way to replace something :

In a particular tag section, as
........

In a particular tag section, with a particular class name, as
Bla bla blah

has already been discussed in these posts :

https://notepad-plus-plus.org/community/topic/15058/regex-remove-particular-words-from-tags-in-several-text-pages/10

https://notepad-plus-plus.org/community/topic/15058/regex-remove-particular-words-from-tags-in-several-text-pages/12

So, the general regex solution, when you want to perform a Search/Replacement, ONLY in an area, which is located between two particular boundaries, is :

SEARCH (?-s)(\G|BR)((?!ER).)*?\KSR OR (?s)(\G|BR)((?!ER).)*?\KSR

REPLACE RR

where :

BR ( Begining Regex ) is the regex which defines the start of the defined zone, for the search/replacement

ER ( Ending Regex ) is the regex which defines the end of the defined zone, for the search/replacement

SR ( Search Regex ) is the regex which defines the expression to search, in any defined zone

RR ( Replace Regex ) is the regex which defines the expression replacing the Search Regex, in any defined zone

In your case :

BR =

ER =

SR = ((?<=>)\h+|\h+(?=<|\h))

RR = Nothing

Notes :

SR is a search of any of the two alternatives, separated with the | symbol, and surrounded by parentheses, because of the lowest priority of the alternation symbol |

(?<=>)\h+ which tries to match any non-null range of horizontal blank chars ( spaces or tabulations ), if preceded with the > symbol

\h+(?=<|\h)) which tries to match any non-null range of horizontal blank chars ( spaces or tabulations ), if followed by, either, the < symbol or a final horizontal blank character

As all these blank characters matched have to be deleted, the replacement zone is just empty

First, the regex tries to find the string
, followed by the shortest range, even null, of characters, .*?, till the search regex, explained above, with the condition that the string
must not located at any position of this range

Due to the \K syntax, the regex engine resets its working location and forgets any previous match. So the final match is ,simply, the part described above ((?<=>)\h+|\h+(?=<|\h)) ( SR )

After this first match, it can only match the zero-length assertion \G, followed, again, with a possible other shortest range, even null … … … as just above !

When the regex engine skips the ending boundary
, the \G cannot be verified anymore and the only way to match something else is to grab, again, a
string, further on !

If you are only interested in single-line ranges BR.........ER, use the (?-s) modifier, at beginning of the search regex

If you may have some multi-lines ranges BR.........ER, use the (?s) modifier, at beginning of the search regex

So, Robin, let’s imagine the sample text, below :

Laurie Strode comes to her final confrontation with Michael Myers, the masked figure who has haunted her since she narrowly escaped. Laurie Strode comes to her final confrontation with Michael Myers, the masked figure who has haunted her since she narrowly escaped. Laurie Strode comes to her final confrontation with Michael Myers, the masked figure who has haunted her since she narrowly escaped. bla blah blah This is a test bla blah blah The final test Laurie Strode comes to her final confrontation with Michael Myers, the masked figure who has haunted her since she narrowly escaped. bla blah ....blah This is an other test to verify if the regex is correct

Using the following regex S/R :

SEARCH (?s)(\G|
)((?!
).)*?\K((?<=>)\h+|\h+(?=<|\h))

REPLACE Leave EMPTY

You should get the expected text, below :

Laurie Strode comes to her final confrontation with Michael Myers, the masked figure who has haunted her since she narrowly escaped. Laurie Strode comes to her final confrontation with Michael Myers, the masked figure who has haunted her since she narrowly escaped. Laurie Strode comes to her final confrontation with Michael Myers, the masked figure who has haunted her since she narrowly escaped. bla blah blah This is a test bla blah blah The final test Laurie Strode comes to her final confrontation with Michael Myers, the masked figure who has haunted her since she narrowly escaped. bla blah ....blah This is an other test to verify if the regex is correct

Notes :

It’s easy to verify that blank characters have been removed, ONLY in all areas
..........
, whatever they were single-line areas or a multi-lines blocks

However, note that if a line ends with blank chars and the next line begins, also, with blank characters, this regex S/R will keep one blank char, either, at the end of this line and at the beginning of the next line !

Finally, beware that, if words are separated with a mix of space and tabulation characters, only the final blank character will be kept !

For instance from text :

Test #1 ( Last BLANK char = SPACE ,before #1 ) Test #2 ( Last BLANK char = TABULATION, before #2 )

You’ll obtain :

Test #1 ( SPACE char between Test and #1 ) Test #2 ( TABULATION char between Test and #2 )

Best Regards,

guy038