• Login
Community
  • Login

Regex with unexpected repeat application

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
4 Posts 3 Posters 387 Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • R
    Richard Darwin
    last edited by Richard Darwin Feb 22, 2024, 6:03 PM Feb 22, 2024, 5:59 PM

    Hello Notepad++ users:
    Could you please help me with a regex problem?

    First off, here is my Debug data:

    Notepad++ v8.6.2   (32-bit)
    Build time : Jan 14 2024 - 02:18:41
    Path : C:\Program Files (x86)\Notepad++\notepad++.exe
    Command Line : "E:\Linguistica\FrequencyList\lemma-pos.txt" 
    Admin mode : OFF
    Local Conf mode : OFF
    Cloud Config : OFF
    OS Name : Windows 10 Home (64-bit)
    OS Version : 22H2
    OS Build : 19045.4046
    Current ANSI codepage : 1252
    Plugins : 
        mimeTools (3)
        NppConverter (4.5)
        NppExport (0.4)
    

    Here is some sample data.  The first four ‘spaces’ are really TABs, while the ‘Inflections’ part is comma-separated.

    # LEMMA|POS LEMMA POS FREQUENCY INFLECTIONS
    underestimate|v underestimate v 35 underestimate, underestimated, underestimates, underestimating
    unique|j unique j 32 unique, uniquer, uniquest
    various|j various j 32 various
    vein|n vein n 32 vein, veins
    weep|v weep v 32 weep, weeping, weeps, wept
    whiskey n 32 whiskey, whiskeys, whiskies
    witty j 32 witty, wittier, wittiest
    worry|n worry n 32 worry, worries
    memorial|n memorial n 31 memorial, memorials
    

    I want to strip out the redundant first block of text in each line, the one containing ‘|’,  and the TAB after it.

    Here is how it should look:

    underestimate v 35 underestimate, underestimated, underestimates, underestimating
    unique j 32 unique, uniquer, uniquest
    various j 32 various
    vein n 32 vein, veins
    weep v 32 weep, weeping, weeps, wept
    whiskey n 32 whiskey, whiskeys, whiskies
    witty j 32 witty, wittier, wittiest
    worry n 32 worry, worries
    memorial n 31 memorial, memorials
    

    To accomplish this, I have tried using the following:
    Find/Replace expressions and settings

    • Find What = ^([a-z|]+\t)
    • Replace With = ``
    • Search Mode = REGULAR EXPRESSION
    • Dot Matches Newline = NOT CHECKED
      (I also have ‘Match case’ and ‘Wrap around’ OFF.

    This regex does the job when I apply it with ‘Find Next’ and ‘Replace’, but when I apply it using ‘Replace All’, I get this output:

    35 underestimate, underestimated, underestimates, underestimating
    32 unique, uniquer, uniquest
    32 various
    32 vein, veins
    32 weep, weeping, weeps, wept
    32 whiskey, whiskeys, whiskies
    32 witty, wittier, wittiest
    32 worry, worries
    31 memorial, memorials
    

    I tried using ‘*’ instead of ‘+’ in the regex but got exactly the same unwanted result.

    ObvIously the regex is being applied more than once per line. I don’t know why this is happening.  I thought the regex should only apply once at the start of each line, given ‘^’.  Is there a ‘global’ flag somewhere that I inadvertently set?  If so, how do I access it?

    Any advice would be appreciated.

    –
    rick.darwin@gmail.com
    –Charles Darwin? He was my grandfather.  Oh, that Charles.  We share a common ancestor.

    T 1 Reply Last reply Feb 22, 2024, 6:38 PM Reply Quote 0
    • T
      Terry R @Richard Darwin
      last edited by Terry R Feb 22, 2024, 7:11 PM Feb 22, 2024, 6:38 PM

      @Richard-Darwin said in Regex with unexpected repeat application:

      ObvIously the regex is being applied more than once per line. I don’t know why this is happening.

      When I copied your example, there were no tabs so I had to interpret where I thought they might be and the first line helped. Whilst I used your regex without any alteration I did not clear the “dot matches newline” etc. You will find as you look at solutions provided by members that we prefer instead to use modifiers. Reference for this is in the online manual here .
      The reason for doing so is that these modifiers will override any settings the user might have set and forgot to change, that way we (as solution provider) have more certainty that our provided regex will work as expected.

      Now you say when you used find, then replace repeatedly it worked. Well I tried the same and as expected it didn’t work. That process and the Replace All should work exactly the same (EDIT see following post where a setting elsewhere may have influenced your result). So your problem is that with the cursor at the start of a line, it finds the string to remove, does so and afterwards the cursor is STILL at the start of the same line. The next iteration of Find/Replace will find yet another occurance on the same line.

      So what you need to do is at least capture 1 further character and replace that (write it back) so the cursor isn’t at the start of a line. My modified regex actually captures the remainder of the line and writes it all back, this places the cursor at the end of a line.

      So my regex is Find What:(?-s)^[a-z|]+\t(.+) and Replace With:${1}. The (?-s) means the same as clearing the Dot matches newline.

      Terry

      C 1 Reply Last reply Feb 22, 2024, 6:59 PM Reply Quote 2
      • C
        Coises @Terry R
        last edited by Feb 22, 2024, 6:59 PM

        @Richard-Darwin said in Regex with unexpected repeat application:

        This regex does the job when I apply it with ‘Find Next’ and ‘Replace’, but when I apply it using ‘Replace All’, I get this output:

        @Terry-R said in Regex with unexpected repeat application:

        Now you say when you used find, then replace repeatedly it worked. Well I tried the same and as expected it didn’t work. That process and the Replace All should work exactly the same.

        Most likely the original poster has Settings | Preferences… | Searching | Replace: Don’t move to the following occurrence unchecked (I believe unchecked is the default), and is literally repeating both Find Next and Replace.

        With the setting I mentioned unchecked, Replace does the next Find automatically, so the original poster is doing a second find after every replace. Replace All, of course, doesn’t do that.

        T 1 Reply Last reply Feb 22, 2024, 7:07 PM Reply Quote 3
        • T
          Terry R @Coises
          last edited by Feb 22, 2024, 7:07 PM

          @Coises said in Regex with unexpected repeat application:

          Most likely the original poster has Settings | Preferences… | Searching | Replace: Don’t move to the following occurrence unchecked (I believe unchecked is the default), and is literally repeating both Find Next and Replace.

          I never knew that setting was there and was trying to figure out why he had a different result to me. Thanks, good to know there are still some things to learn about NPP.

          Terry

          1 Reply Last reply Reply Quote 0
          3 out of 4
          • First post
            3/4
            Last post
          The Community of users of the Notepad++ text editor.
          Powered by NodeBB | Contributors