Regex start of line doesn't work as expected

Salvatore Falcone · Jul 28, 2019, 9:13 AM

Hello,

I am trying to remove the first character of each line on a block of text. I’d expect “^.” to get the job done, while it clears everything, as an iteration were taking place.

Any clues?

TIA

Ekopalypse · Jul 28, 2019, 1:27 PM

@Salvatore-Falcone

in your example, once the first char has been removed, the next char is the first again.
So, I guess you are looking for something like
find what:^.(.*)$
replace with \1

guy038 · Jul 30, 2019, 8:50 AM

Hello, @salvatore-falcone and All,

Here are a solution :

SEARCH (?-s)^.(.?)
REPLACE \1
Tick the Wrap around option, if necessary
Select the Regular expression search mode
Click once on the Replace All button or several times on the Replace button

Notes :

The (?-s) is a in-line modifier, which forces the regex engine to interprets the dot symbol ( . ) as a standard character only, ( not EOL chars ! )
The ^ symbol is a zero-length assertion, which is the location of the beginning of current line scanned
In replacement, we just rewrite group 1, only. So, the second standard character of current line, if any !

IMPORTANT :

Why can we search for the first character of each line, if any, with the simple regex (?-s)^. and why we need an other syntax when we want to delete this first character ? @ekopalypse already answered the reason why !

When we just search for the first character of lines, after a match, the current location is always after this first char. So, in order to verify the ^ assertion of the regex, we need, necessarily, to jump to the next line and match the first character of this line

But, when we, really, delete the first character of a line, after replacement, this time, it’s its second character which is, now, located at beginning of current line ! Therefore, as this satisfies the search regex, this second character is, then, deleted, too ! And so on…

On the contrary, when your search for the first two characters and just rewrite the second one, after the replacement, current location is after this re-written character ( so, NOT at beginning of a line ! )

Best Regards,

guy038

Salvatore Falcone · Jul 29, 2019, 7:22 PM

Thank you guys for the explanations and solutions. Nontheless, this doesn’t feel as a polite behaviour from N++. Which regex flavour is it compliant to?

Just for the rec, here’s a posix tool in action:

~$ echo ">> test line 1" > test.txt
~$ echo ">> test line 2" >> test.txt
~$ cat test.txt
>> test line 1
>> test line 2
~$ sed 's ^.  ' test.txt
> test line 1
> test line 2

Alan Kilborn · Jul 29, 2019, 8:10 PM

@Salvatore-Falcone said:

this doesn’t feel as a polite behaviour from N++

It seems “polite” to me. Eko said it all with “once the first char has been removed, the next char is the first again”.

Which regex flavour is it compliant to?

See https://notepad-plus-plus.org/community/topic/15765/faq-desk-where-to-find-regex-documentation

Salvatore Falcone · Jul 29, 2019, 8:48 PM

@Alan-Kilborn said:
[…]

It seems “polite” to me. Eko said it all with “once the first char has been removed, the next char is the first again”.

I’ll try to be more explicit. You are portraying an iteration, while standard behaviour is to match the rule, substitute, and carry on through the string (if the g -for global- option is set, as I expect it to be when replacing text). If you do apply those simple steps, you shouldn’t need to catch anything when using ^., as “beginning of line” should be parsed once for each line.

See https://notepad-plus-plus.org/community/topic/15765/faq-desk-where-to-find-regex-documentation

Thank you for the pointers!

Cheers

Salvatore Falcone · Jul 29, 2019, 9:13 PM

@Ekopalypse said:
[…]

So, I guess you are looking for something like
find what:^.(.*)$
replace with \1

Actually I found out that
find what: ^.(.)
replace with: \1
is enough.

Still disturbing to me ;-)

PeterJones · Jul 29, 2019, 9:49 PM

@Salvatore-Falcone said:

^.(.) is enough.

Yes, because it moves the cursor beyond the first character of the line, so now ^ isn’t matching at the current cursor position, and it needs to move forward to

Still disturbing to me ;-)

Probably best to think of it not as the /g switch, but as looping on rerunning the regex from a new start point, until it’s hit the end of the document (or wrapped around back to the start of your search). An editor can be a different environment than other programs; and adding in the hooks for interactivity (like “find next / replace” along with “replace all”) changes the way that it has to think about the current position, global matches, etc.

For example, making “replace all” behave more like a repeated “find next/replace”: if the user tried a couple of single "replace"s to make sure it was working, and then hit “replace all”, and the behavior changed, that would be disturbing to the majority of users.

Many Notepad++ users don’t come from a programming/regex background – as is obvious by the number of “how do I search-and-replace ___” questions we see here, without any attempt at a regex given – and Notepad++ walks the fine line between making things easy for general-users, but with enough features for the super-users. I think it does a good job of that.

Salvatore Falcone · Jul 29, 2019, 10:36 PM

@PeterJones said:

For example, making “replace all” behave more like a repeated “find next/replace”: if the user tried a couple of single "replace"s to make sure it was working, and then hit “replace all”, and the behavior changed, that would be disturbing to the majority of users.

I didn’t consider this.

Thanks for your thoughts Peter.

Alan Kilborn · Jul 30, 2019, 12:12 AM

@PeterJones said:

if the user tried a couple of single "replace"s to make sure it was working, and then hit “replace all”, and the behavior changed, that would be disturbing

Ha! I’m surprised that @guy038 hasn’t jumped in yet to talk about how exactly this sort of thing can and does happen!! :)