Regex start of line doesn't work as expected



  • Hello,

    I am trying to remove the first character of each line on a block of text. I’d expect “^.” to get the job done, while it clears everything, as an iteration were taking place.

    Any clues?

    TIA



  • @Salvatore-Falcone

    in your example, once the first char has been removed, the next char is the first again.
    So, I guess you are looking for something like
    find what:^.(.*)$
    replace with \1



  • Hello, @salvatore-falcone and All,

    Here are a solution :

    • SEARCH (?-s)^.(.?)

    • REPLACE \1

    • Tick the Wrap around option, if necessary

    • Select the Regular expression search mode

    • Click once on the Replace All button or several times on the Replace button

    Notes :

    • The (?-s) is a in-line modifier, which forces the regex engine to interprets the dot symbol ( . ) as a standard character only, ( not EOL chars ! )

    • The ^ symbol is a zero-length assertion, which is the location of the beginning of current line scanned

    • In replacement, we just rewrite group 1, only. So, the second standard character of current line, if any !

    IMPORTANT :

    Why can we search for the first character of each line, if any, with the simple regex (?-s)^. and why we need an other syntax when we want to delete this first character ? @ekopalypse already answered the reason why !

    When we just search for the first character of lines, after a match, the current location is always after this first char. So, in order to verify the ^ assertion of the regex, we need, necessarily, to jump to the next line and match the first character of this line

    But, when we, really, delete the first character of a line, after replacement, this time, it’s its second character which is, now, located at beginning of current line ! Therefore, as this satisfies the search regex, this second character is, then, deleted, too ! And so on…

    On the contrary, when your search for the first two characters and just rewrite the second one, after the replacement, current location is after this re-written character ( so, NOT at beginning of a line ! )

    Best Regards,

    guy038



  • Thank you guys for the explanations and solutions. Nontheless, this doesn’t feel as a polite behaviour from N++. Which regex flavour is it compliant to?

    Just for the rec, here’s a posix tool in action:

    ~$ echo ">> test line 1" > test.txt
    ~$ echo ">> test line 2" >> test.txt
    ~$ cat test.txt
    >> test line 1
    >> test line 2
    ~$ sed 's ^.  ' test.txt
    > test line 1
    > test line 2


  • @Salvatore-Falcone said:

    this doesn’t feel as a polite behaviour from N++

    It seems “polite” to me. Eko said it all with “once the first char has been removed, the next char is the first again”.

    Which regex flavour is it compliant to?

    See https://notepad-plus-plus.org/community/topic/15765/faq-desk-where-to-find-regex-documentation



  • @Alan-Kilborn said:
    […]

    It seems “polite” to me. Eko said it all with “once the first char has been removed, the next char is the first again”.

    I’ll try to be more explicit. You are portraying an iteration, while standard behaviour is to match the rule, substitute, and carry on through the string (if the g -for global- option is set, as I expect it to be when replacing text). If you do apply those simple steps, you shouldn’t need to catch anything when using ^., as “beginning of line” should be parsed once for each line.

    See https://notepad-plus-plus.org/community/topic/15765/faq-desk-where-to-find-regex-documentation

    Thank you for the pointers!

    Cheers



  • @Ekopalypse said:
    […]

    So, I guess you are looking for something like
    find what:^.(.*)$
    replace with \1

    Actually I found out that
    find what: ^.(.)
    replace with: \1
    is enough.

    Still disturbing to me ;-)



  • @Salvatore-Falcone said:

    ^.(.) is enough.

    Yes, because it moves the cursor beyond the first character of the line, so now ^ isn’t matching at the current cursor position, and it needs to move forward to

    Still disturbing to me ;-)

    Probably best to think of it not as the /g switch, but as looping on rerunning the regex from a new start point, until it’s hit the end of the document (or wrapped around back to the start of your search). An editor can be a different environment than other programs; and adding in the hooks for interactivity (like “find next / replace” along with “replace all”) changes the way that it has to think about the current position, global matches, etc.

    For example, making “replace all” behave more like a repeated “find next/replace”: if the user tried a couple of single "replace"s to make sure it was working, and then hit “replace all”, and the behavior changed, that would be disturbing to the majority of users.

    Many Notepad++ users don’t come from a programming/regex background – as is obvious by the number of “how do I search-and-replace ___” questions we see here, without any attempt at a regex given – and Notepad++ walks the fine line between making things easy for general-users, but with enough features for the super-users. I think it does a good job of that.



  • @PeterJones said:

    For example, making “replace all” behave more like a repeated “find next/replace”: if the user tried a couple of single "replace"s to make sure it was working, and then hit “replace all”, and the behavior changed, that would be disturbing to the majority of users.

    I didn’t consider this.

    Thanks for your thoughts Peter.



  • @PeterJones said:

    if the user tried a couple of single "replace"s to make sure it was working, and then hit “replace all”, and the behavior changed, that would be disturbing

    Ha! I’m surprised that @guy038 hasn’t jumped in yet to talk about how exactly this sort of thing can and does happen!! :)


Log in to reply