Community
    • Login

    PythonScript regex error, but not as NPP regex

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    pythonpythonscriptregex
    4 Posts 2 Posters 41 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M Andre Z EckenrodeM
      M Andre Z Eckenrode
      last edited by

      I use PythonScript, mostly for extended sequences of regular expressions find & replace operations. I typically test my regex code directly in Notepad++ via the built-in Find/Replace dialog before putting it into a script. Both my files being operated on and my scripts are nearly always ANSI/Windows-1252, but I want my scripts to be able to work on unicode text as well. A new script I’m working on includes this problematic line of code:

      editor.rereplace(r'(\)) ([[:alpha:]]+) ([[:alpha:]]+ — )', ur'\1\r\n\t\u\2 \u\3')
      

      That line results in the following error message in PythonScript’s console:

      File "C:\Users\MAZE\AppData\Roaming\Notepad++\plugins\Config\PythonScript\scripts\McCartney-Project.py", line 25
          editor.rereplace(r'(\)) ([[:alpha:]]+) ([[:alpha:]]+ x97 )', ur'\1\r\n\t\u\2 \u\3')
      SyntaxError: (unicode error) 'rawunicodeescape' codec can't decode bytes in position 8-9: truncated \uXXXX
      

      However, that actual regex works directly in NPP. It looks offhand like my m-dash (—) is the root of the problem, as it’s replaced by x97 in the error message, but I use an ANSI m-dash in several other regex operations coming before that line in the same script, and PythonScript doesn’t complain about any of them, and DOES process them as expected (after I’ve commented out the offending line). Anybody know why that particular line is a stumbling block?

      Notepad++ v8.8.5 (32-bit)
      Build time: Aug 14 2025 - 00:17:53
      Scintilla/Lexilla included: 5.5.7/5.4.5
      Boost Regex included: 1_85
      Path: C:\Program Files (x86)\Notepad++\notepad++.exe
      Command Line: “C:\Program Files\ArdfryImaging\PNGOUTWin\PNGOUTWin Reg Codes.txt”
      Admin mode: OFF
      Local Conf mode: OFF
      Cloud Config: OFF
      Periodic Backup: ON
      Placeholders: OFF
      Scintilla Rendering Mode: SC_TECHNOLOGY_DIRECTWRITE (1)
      Multi-instance Mode: monoInst
      asNotepad: OFF
      File Status Auto-Detection: cdEnabledNew (for current file/tab only)
      Dark Mode: OFF
      Display Info:
      primary monitor: 1920x1080, scaling 100%
      visible monitors count: 1
      installed Display Class adapters:
      0000: Description - Intel® HD Graphics 620
      0000: DriverVersion - 31.0.101.2111
      0001: Description - NVIDIA GeForce 940MX
      0001: DriverVersion - 30.0.15.1169
      OS Name: Windows 10 Enterprise (64-bit)
      OS Version: 22H2
      OS Build: 19045.6216
      Current ANSI codepage: 1252
      Plugins:
      BetterMultiSelection (1.5)
      ColumnsPlusPlus (1.2)
      ColumnTools (1.4.5.1)
      ComparePlus (1.2)
      DSpellCheck (1.5)
      ExtSettings (1.3.1)
      HTMLTag_unicode (1.5.4)
      mimeTools (3.1)
      MultiClipboard (2.1)
      MultiReplace (4.3.2.28)
      NppCalc (1.5)
      NppConverter (4.6)
      NppExport (0.4)
      NPPJSONViewer (2.1.1)
      NppTextFX (1.4.1)
      NppXmlTreeviewPlugin (2)
      PreviewHTML (1.3.3.2)
      PythonScript (2.1)
      RegexTrainer (1.2)
      SessionMgr (1.4.4)

      M Andre Z EckenrodeM 1 Reply Last reply Reply Quote 0
      • M Andre Z EckenrodeM
        M Andre Z Eckenrode @M Andre Z Eckenrode
        last edited by

        Actually, I’m now thinking that my use of \u is the problem. I’m looking for it to cause the next character to be output in UPPER CASE, but looks like Python is expecting four hexadecimal digits to specify a Unicode code point.

        EkopalypseE 1 Reply Last reply Reply Quote 1
        • EkopalypseE
          Ekopalypse @M Andre Z Eckenrode
          last edited by Ekopalypse

          @M-Andre-Z-Eckenrode

          Unfortunately, this is still an open issue,

          In this specific case you can use something like this

          editor.rereplace(r'(\)) ([[:alpha:]]+) ([[:alpha:]]+ — )', lambda m: f'{m.group(1)}\r\n\t{m.group(2).title()} {m.group(3).title()}')
          

          EDIT: oopss - just realized you are still using python 2

          editor.rereplace(r'(\)) ([[:alpha:]]+) ([[:alpha:]]+ — )', lambda m: '{}\r\n\t{} {}'.format(m.group(1), m.group(2).title(), m.group(3).title()))
          
          M Andre Z EckenrodeM 1 Reply Last reply Reply Quote 2
          • M Andre Z EckenrodeM
            M Andre Z Eckenrode @Ekopalypse
            last edited by

            @Ekopalypse

            Thanks much. My bad for even bringing it up, actually, since I already had back in 2021 and was advised about the lambda workaround at that time. Forgot about that.

            1 Reply Last reply Reply Quote 0
            • First post
              Last post
            The Community of users of the Notepad++ text editor.
            Powered by NodeBB | Contributors