Problemewith regex in multi-line search

Richard Darwin

Fellow Notepad++ Users,

Could you please help me the the following search-and-replace problem I am having?

Here is the debug info:

Notepad++ v8.7.5   (32-bit)
Build time : Dec 21 2024 - 05:11:15
Path : I:\Binaries\Notepad++\notepad++.exe
Command Line : 
Admin mode : OFF
Local Conf mode : OFF
Cloud Config : OFF
Periodic Backup : OFF
Placeholders : OFF
DirectWrite : ON
Multi-instance Mode : multiInst
File Status Auto-Detection : cdEnabledNew (for current file/tab only)
Dark Mode : OFF
OS Name : Windows 11 Home (64-bit)
OS Version : 24H2
OS Build : 26100.3476
Current ANSI codepage : 1252
Plugins : 
    mimeTools (3.1)
    NppConverter (4.6)
    NppExport (0.4)

I am trying to massage some data about English syllable structure. I have parsed an IPA data file into phoneme classes: P represents [p]|[t]|[k], B represents the voiced counterparts, X represents [f]|[θ]|[s]|[ʃ], Ɣ the voiced fricatives, and so on. V obviously represents any vowel.

What I want to do is find adjacent lines that are identical except that the second one ends in an additional [Ɣ][VƔ] or [X], which will be a plural version of the upper word, or a third-person verb: the original text might have been ‘limp /ˈɫɪmp/’, ‘limps /ˈɫɪmps/’, which are massaged into ˈLVNP’, ‘LVNPX’. I want to replace that suffix with ‘=Z’ (a generalization of =Zp ‘plural’ and =Z3 ‘3rd.person.singular’).

Here is the data I currently have (“before” data):

 
 BRVNBVˈNVPV
 BRVNVˈPVPVV
 --snip--
ˌƔVXˈPVƔVPX
ˌƔVƔVˈBVRVPV
ˌƔVƔVˈPVVXVN
ˌƔVƔVˈPVVXVNƔ
ˌƔVƔVˈVNV
ˌƔVƔVˈVNVV
ˌƔVƔWVRVˈƔVVXVN
ˌƔVˈBRVBV
ˌƔVˈBRVX
ˌƔVˈBRVXVX
ˌƔVˈBVPV
ˌƔVˈBVRV
ˌƔVˈBVRVV
ˌƔVˈBVRVVƔ
ˌƔVˈBVX
ˌƔVˈBVXVX

Here is how I would like that data to look (“after” data):

 
 BRVNBVˈNVPV
 BRVNVˈPVPVV
 --snip--
ˌƔVXˈPVƔVPX
ˌƔVƔVˈBVRVPV
ˌƔVƔVˈPVVXVN
ˌƔVƔVˈPVVXVN=Z
ˌƔVƔVˈVNV
ˌƔVƔVˈVNVV
ˌƔVƔWVRVˈƔVVXVN
ˌƔVˈBRVBV
ˌƔVˈBRVX
ˌƔVˈBRVXVX
ˌƔVˈBVPV
ˌƔVˈBVRV
ˌƔVˈBVRVV
ˌƔVˈBVRVV=Z
ˌƔVˈBVX

To accomplish this, I have tried using the following Find/Replace expressions and settings

Find What = ^(.+)$\n^\1Ɣ$
Replace With = \1\n\1=Z
Search Mode = REGULAR EXPRESSION
Dot Matches Newline = CHECKED or NOT CHECKED

This regex is supposed to match the entire first line and the ‘\n’ and a second line with the suffix (initially just Ɣ). With ‘Dot matches Newline’ unset, this returns ‘Find: can’t find the text
“^(.+)$\n^\1Ɣ$” in entire file’; with the setting checked, it searches the entire file and selects the first line (blank) and all but the last three characters of the second line, or it says ‘Invalid Regular Expression’ and notes that ‘complexity exceeds predefined bounds’, depending on where I start the search (the file is 646k, 64k lines).
Taking out the carets from the regex did not seem to have any effect; taking out the '$'s with Dot unchecked led to ‘can’t find text’ and withDot checked led to ‘invalid regex’.

This did not produce the output I desired, and I’m not sure why. Could you please help me understand what went wrong and help me find the solution?

Coises

@Richard-Darwin

Most likely you are editing a file with Windows line endings, which are \r\n, not just \n.

I would suggest:

Leave . matches newline unchecked — you don’t want that .+ to cross lines
Use \R to represent the break between lines — it matches \r, \n or \r\n
You don’t need the $ when it’s immediately followed by a line-ending character, nor the circumflex immediately after (though they don’t hurt, either)

So, try: ^(.+)\R\1Ɣ$ and see if that matches as desired; leave . matches newline unchecked. If you are using Windows line endings, use \1\r\n\1=Z to replace. (On the status bar at the bottom, towards the right side, you’ll see either Windows (CR LF), Unix (LF) or Macintosh (CR), which tells you the current line ending setting for your file.)