Help converting Notepad++ format to Pythonscript
Since you are already in Pythonscript, why don’t you forget regex and go with “brute force”, something like this, or something similar? (admittedly this does not do the “ultimate” in error-checking, but it is just supposed to be an idea to get you on an alternate track of thinking–if that’s where you want to go…):
new_text = '' inside_parens = False for in_ch in editor.getText(): if not inside_parens and in_ch == ' ': continue elif in_ch == '(': inside_parens = True elif in_ch == ')': inside_parens = False new_text += in_ch editor.setText(new_text)
Well, that works fantastic! I actually set out to use a more programmatical approach to this, but found that I could do almost all of the things I wanted to with simple regex. It wasn’t until I hit the need described above that I got stuck. The regex for it worked, but I couldn’t get the code to work in Pythonscript.
It will take some digging to follow what you did with that code, but it should help me with more advanced alterations in the future…when I get there. Thanks!
Not sure if you are directing that response to @Claudia-Frank or to me…but I’ll say this about the regex approach: I never understood how your regex worked live with Notepad++ – maybe the markdown syntax on this site was “stealing” some parts of it.
If I try it as I see it above I get this as the AFTER text, which is clearly not right:
OO11223344 ((EEXXAAMMPPLLEE PPRROOGGRRAAMM)) ((TTHHIISS IISS AA CCNNCC CCOOMMMMEENNTT)) ((II WWAANNTT TTHHEESSEE LLIINNEESS TTOO RREETTAAIINN SSPPAACCEESS)) (()) GG2288 GG9911 ZZ00 TT11 MM66 ((EEXXAAMMPPLLEE TTOOOOLL)) MM33 SS1100000000 GG9900 GG00 GG9955 GG5544 XX11..00 YY22..00 GG4433 ZZ00..2255 HH11 DD11 MM88 GG11 ZZ--11..00 FF00..000066 XX--11..00 YY--22..00 YY22..00 XX11..00 GG00 ZZ11..00 MM55 GG2288 GG9911 ZZ00 MM99 MM0011 ((CCHHEECCKK PPAARRTT))
What gives? Shouldn’t I see some
sequences in the regex? -
@Scott-Sumner said:
Not sure if you are directing that response to @Claudia-Frank or to me…but I’ll say this about the regex approach: I never understood how your regex worked live with Notepad++ – maybe the markdown syntax on this site was “stealing” some parts of it.
If I try it as I see it above I get this as the AFTER text, which is clearly not right:
OO11223344 ((EEXXAAMMPPLLEE PPRROOGGRRAAMM)) ((TTHHIISS IISS AA CCNNCC CCOOMMMMEENNTT)) ((II WWAANNTT TTHHEESSEE LLIINNEESS TTOO RREETTAAIINN SSPPAACCEESS)) (()) GG2288 GG9911 ZZ00 TT11 MM66 ((EEXXAAMMPPLLEE TTOOOOLL)) MM33 SS1100000000 GG9900 GG00 GG9955 GG5544 XX11..00 YY22..00 GG4433 ZZ00..2255 HH11 DD11 MM88 GG11 ZZ--11..00 FF00..000066 XX--11..00 YY--22..00 YY22..00 XX11..00 GG00 ZZ11..00 MM55 GG2288 GG9911 ZZ00 MM99 MM0011 ((CCHHEECCKK PPAARRTT))
What gives? Shouldn’t I see some
sequences in the regex?My response was directed at Claudia. And you’re right, the code I posted above does not result in the proper result in Notepad++. Something must have been lost. I could go back and find it, but at this point my question has been answered so I am going to call this topic closed. Thank you all for the responses!
sorry, didn’t understand that you want a python script solution with regular expression replacement.
regex = r'(\(.*?\))|(\w+)\s' editor.rereplace(regex, lambda m: '{}{}'.format(*m.groups()))
Claudia -
I am most-curious about why the OP (@Andrew-Clark) could get his regex replacement working in interactive N++ and not with Pythonscript, but perhaps that is a question that isn’t going to get answered…my guess would be not using raw string notation (the leading
) in the PS… -
or maybe it was about how to get the match groups returned as replacement … ??
Claudia -
Hello @andrew-clark, @scott-sumner, @claudia-frank and All,
I was away this weekend, to hike in the Cevennes mountains and my feet still remember theses rides ! As you can imagine, just sitting, comfortably, in front of your laptop and discussing Notepad++ matters is a real treat ;-))
So, Andrew, as for me, I would use, with native N++, the regex S/R, below :
Notes :
As usual , the
modifier means that any regex dot character (.
) will match a single standard char., only. -
Then, this regex would match :
First, any shortest range of chars, between parentheses, stores it as group
and rewrites this group as it is. ( Note that the(
characters have to be escaped, with a\
symbol, to be considered as literals ! ) -
Secondly, any space character, just ignored in replacement.
Hi Guy, nice shortening but one question - it seems that
is, for the posted example data, not needed in the replacement - what is the idea behind it?Thank you and cheers
Claudia -
Just in case someone wants to add this into a python script it would now look like this
regex = r'(?-s)(\(.+?\))|\x20' editor.rereplace(regex, lambda m: '{}'.format(m.groups()[0]))
Claudia -
@Claudia-Frank said:
or maybe it was about how to get the match groups returned as replacement … ??
I don’t know…the Pythonscript docs for
seem to have an adequate example of this…but considering it now maybe it could be made better? -
Hi, @andrew-clark, @scott-sumner, @claudia-frank and All
Sorry, for my late answer as I was elaborating a regex, for the tricky case, below :
Ah, how silly am I ! Of course, when the regex finds a space character ( the second alternative ), the group
is not defined. Thus, in replacement, it simply rewrites…nothing, as group1
represents a zero-length string ;-))Clever deduction, Claudia. So a possible regex is :
Note : I also changed the
modifier into(?s)
, which allows possible multi-lines ranges of text, between parentheses !Best Regards,
Hi, @andrew-clark, @scott-sumner, @claudia-frank and All
Ah :-(( I should have tested my idea of changing
, before posting. Actually, in case of multi-lines range of characters, between parentheses, the suitable regex S/R is , rather :SEARCH
Hope it’s the right one, this time :-D