Community
    • Login

    PythonScript regex error, but not as NPP regex

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    pythonpythonscriptregex
    4 Posts 2 Posters 542 Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M Andre Z EckenrodeM Offline
      M Andre Z Eckenrode
      last edited by

      I use PythonScript, mostly for extended sequences of regular expressions find & replace operations. I typically test my regex code directly in Notepad++ via the built-in Find/Replace dialog before putting it into a script. Both my files being operated on and my scripts are nearly always ANSI/Windows-1252, but I want my scripts to be able to work on unicode text as well. A new script I’m working on includes this problematic line of code:

      editor.rereplace(r'(\)) ([[:alpha:]]+) ([[:alpha:]]+ — )', ur'\1\r\n\t\u\2 \u\3')
      

      That line results in the following error message in PythonScript’s console:

      File "C:\Users\MAZE\AppData\Roaming\Notepad++\plugins\Config\PythonScript\scripts\McCartney-Project.py", line 25
          editor.rereplace(r'(\)) ([[:alpha:]]+) ([[:alpha:]]+ x97 )', ur'\1\r\n\t\u\2 \u\3')
      SyntaxError: (unicode error) 'rawunicodeescape' codec can't decode bytes in position 8-9: truncated \uXXXX
      

      However, that actual regex works directly in NPP. It looks offhand like my m-dash (—) is the root of the problem, as it’s replaced by x97 in the error message, but I use an ANSI m-dash in several other regex operations coming before that line in the same script, and PythonScript doesn’t complain about any of them, and DOES process them as expected (after I’ve commented out the offending line). Anybody know why that particular line is a stumbling block?

      Notepad++ v8.8.5 (32-bit)
      Build time: Aug 14 2025 - 00:17:53
      Scintilla/Lexilla included: 5.5.7/5.4.5
      Boost Regex included: 1_85
      Path: C:\Program Files (x86)\Notepad++\notepad++.exe
      Command Line: “C:\Program Files\ArdfryImaging\PNGOUTWin\PNGOUTWin Reg Codes.txt”
      Admin mode: OFF
      Local Conf mode: OFF
      Cloud Config: OFF
      Periodic Backup: ON
      Placeholders: OFF
      Scintilla Rendering Mode: SC_TECHNOLOGY_DIRECTWRITE (1)
      Multi-instance Mode: monoInst
      asNotepad: OFF
      File Status Auto-Detection: cdEnabledNew (for current file/tab only)
      Dark Mode: OFF
      Display Info:
      primary monitor: 1920x1080, scaling 100%
      visible monitors count: 1
      installed Display Class adapters:
      0000: Description - Intel® HD Graphics 620
      0000: DriverVersion - 31.0.101.2111
      0001: Description - NVIDIA GeForce 940MX
      0001: DriverVersion - 30.0.15.1169
      OS Name: Windows 10 Enterprise (64-bit)
      OS Version: 22H2
      OS Build: 19045.6216
      Current ANSI codepage: 1252
      Plugins:
      BetterMultiSelection (1.5)
      ColumnsPlusPlus (1.2)
      ColumnTools (1.4.5.1)
      ComparePlus (1.2)
      DSpellCheck (1.5)
      ExtSettings (1.3.1)
      HTMLTag_unicode (1.5.4)
      mimeTools (3.1)
      MultiClipboard (2.1)
      MultiReplace (4.3.2.28)
      NppCalc (1.5)
      NppConverter (4.6)
      NppExport (0.4)
      NPPJSONViewer (2.1.1)
      NppTextFX (1.4.1)
      NppXmlTreeviewPlugin (2)
      PreviewHTML (1.3.3.2)
      PythonScript (2.1)
      RegexTrainer (1.2)
      SessionMgr (1.4.4)

      M Andre Z EckenrodeM 1 Reply Last reply Reply Quote 0
      • M Andre Z EckenrodeM Offline
        M Andre Z Eckenrode @M Andre Z Eckenrode
        last edited by

        Actually, I’m now thinking that my use of \u is the problem. I’m looking for it to cause the next character to be output in UPPER CASE, but looks like Python is expecting four hexadecimal digits to specify a Unicode code point.

        EkopalypseE 1 Reply Last reply Reply Quote 1
        • EkopalypseE Offline
          Ekopalypse @M Andre Z Eckenrode
          last edited by Ekopalypse

          @M-Andre-Z-Eckenrode

          Unfortunately, this is still an open issue,

          In this specific case you can use something like this

          editor.rereplace(r'(\)) ([[:alpha:]]+) ([[:alpha:]]+ — )', lambda m: f'{m.group(1)}\r\n\t{m.group(2).title()} {m.group(3).title()}')
          

          EDIT: oopss - just realized you are still using python 2

          editor.rereplace(r'(\)) ([[:alpha:]]+) ([[:alpha:]]+ — )', lambda m: '{}\r\n\t{} {}'.format(m.group(1), m.group(2).title(), m.group(3).title()))
          
          M Andre Z EckenrodeM 1 Reply Last reply Reply Quote 3
          • M Andre Z EckenrodeM Offline
            M Andre Z Eckenrode @Ekopalypse
            last edited by

            @Ekopalypse

            Thanks much. My bad for even bringing it up, actually, since I already had back in 2021 and was advised about the lambda workaround at that time. Forgot about that.

            1 Reply Last reply Reply Quote 1

            Hello! It looks like you're interested in this conversation, but you don't have an account yet.

            Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

            With your input, this post could be even better 💗

            Register Login
            • First post
              Last post
            The Community of users of the Notepad++ text editor.
            Powered by NodeBB | Contributors