Two simple commands for text editing (beginner)

coconut79

Hello, I am trying to use notepad++ for text editing, however, I am new to the program itself and relatively new to coding. I’ll try my best to explain what I want to do.

A find and replace of a specific instance of word that occurs more than once. For example, if in my text I have the word “number” 3 times, I want the 2nd instance of that word to be replaced with the word “two”. So if my original text is “number, number, number”, I want it to become “number, two, number” by running a single command.
I want to put new paragraphs in big chunks of text. So something like “if the sentence is more than 100 characters, find the next stop word and put a new paragraph after it”.

That’s pretty much it, I hope these questions make sense but let me know if they don’t.

Alan Kilborn

@coconut79

For your #1, not possible with Notepad++ without the help of a scripting plugin (you would have to write the scripting code).

For your #2, probably possible, but a real life example of such text would really help.

coconut79

@Alan-Kilborn
Thanks for the quick reply. Understood about #1. An example for #2 would be
“This is a sentence which contains more than 50 characters. This is the second sentence in the text”.
should become
“This is a sentence which contains more than 50 characters.
This is the second sentence in the text.”

Or a real-life example of a chunk of text

Dylan Thomas was born on 27 October 1914 in Swansea, the son of Florence Hannah (née Williams; 1882–1958), a seamstress, and David John Thomas (1876–1952), a teacher. His father had a first-class honours degree in English from University College, Aberystwyth and ambitions to rise above his position teaching English literature at the local grammar school.[4] Thomas had one sibling, Nancy Marles (1906–1953), who was eight years his senior.[5] The children spoke only English, though their parents were bilingual in English and Welsh, and David Thomas gave Welsh lessons at home. Thomas’s father chose the name Dylan, which could be translated as “son of the sea”, after Dylan ail Don, a character in The Mabinogion.[6] His middle name, Marlais, was given in honour of his great-uncle, William Thomas, a Unitarian minister and poet whose bardic name was Gwilym Marles.[5][7] Dylan, pronounced ˈ [ˈdəlan] (Dull-an) in Welsh, caused his mother to worry that he might be teased as the “dull one”.[8] When he broadcast on Welsh BBC, early in his career, he was introduced using this pronunciation. Thomas favoured the Anglicised pronunciation and gave instructions that it should be Dillan /ˈdɪlən/.[5][9]

would become

Dylan Thomas was born on 27 October 1914 in Swansea, the son of Florence Hannah (née Williams; 1882–1958), a seamstress, and David John Thomas (1876–1952), a teacher.

His father had a first-class honours degree in English from University College, Aberystwyth and ambitions to rise above his position teaching English literature at the local grammar school.[4]

Thomas had one sibling, Nancy Marles (1906–1953), who was eight years his senior.[5] The children spoke only English, though their parents were bilingual in English and Welsh, and David Thomas gave Welsh lessons at home.

Thomas’s father chose the name Dylan, which could be translated as “son of the sea”, after Dylan ail Don, a character in The Mabinogion.[6] His middle name, Marlais, was given in honour of his great-uncle, William Thomas, a Unitarian minister and poet whose bardic name was Gwilym Marles.[5][7]

Dylan, pronounced ˈ [ˈdəlan] (Dull-an) in Welsh, caused his mother to worry that he might be teased as the “dull one”.[8] When he broadcast on Welsh BBC, early in his career, he was introduced using this pronunciation. Thomas favoured the Anglicised pronunciation and gave instructions that it should be Dillan /ˈdɪlən/.[5][9]

(Line break imitates a new paragraph for better illustration).

Let me know if what I’m trying to do makes sense and if it’s possible to do in notepad++. Much appreciated.

guy038

Hello, @coconut79, @alan-kilborn and All,

Regarding your point #2, simply use the regex S/R, below :

SEARCH (?-s)(.{150,250}\.(\[\d+\])*)\h+

REPLACE \1\r\n\r\n

After clicking on the Replace All button, you should get the text :

Dylan Thomas was born on 27 October 1914 in Swansea, the son of Florence Hannah (née Williams; 1882–1958), a seamstress, and David John Thomas (1876–1952), a teacher.

His father had a first-class honours degree in English from University College, Aberystwyth and ambitions to rise above his position teaching English literature at the local grammar school.[4]

Thomas had one sibling, Nancy Marles (1906–1953), who was eight years his senior.[5] The children spoke only English, though their parents were bilingual in English and Welsh, and David Thomas gave Welsh lessons at home.

Thomas’s father chose the name Dylan, which could be translated as “son of the sea”, after Dylan ail Don, a character in The Mabinogion.[6] His middle name, Marlais, was given in honour of his great-uncle, William Thomas, a Unitarian minister and poet whose bardic name was Gwilym Marles.[5][7]

Dylan, pronounced ˈ [ˈdəlan] (Dull-an) in Welsh, caused his mother to worry that he might be teased as the “dull one”.[8] When he broadcast on Welsh BBC, early in his career, he was introduced using this pronunciation.

Thomas favoured the Anglicised pronunciation and gave instructions that it should be Dillan /ˈdɪlən/.[5][9]

Remarks :

The last line, of the text above, isn’t identical to your last line, in your post. Quite logical as your last line is 326 chars long. So, more than twice the desired length of 150 chars ! Hence, the S/R split this long line in two lines of 218 and 107 chars long ;-))
So, the new lines will, all, have between 150 and 250 characters long
Just play around with these limits to get the solution that suits you best !

Regarding point #1, Alan is right about it. Notepad++ do not have any feature to search and replace a specific occurrence, in current line. However, as a work-around, it’s possible, most of the time, to build a specific regex to do so ! For instance, assuming this sample text, which contains some occurrences of the string number :

This number is the right number of occurrences of the word "number"

This number is the wrong number of occurrences.

The number of lines, after replacement, was	about five times the initial number of lines but, as the lines were between 150 and 250 chars long, this number still seemed acceptable compared to the number of lines required to list all words of text, one per line. If we reduce the number of required characters to the range 50-150, before a full stop, we increase the number of new paragraphs ! 

number,number,number,number,number

numbernumbernumbernumbernumbernumbernumber

The specific regex S/R, below, changes, exclusively, the 3rd occurrence of the word number, in each line, with the word THREE :

SEARCH (?-s)^((((?!number).)*number){2}.*?)number(.*)

REPLACE \1THREE\4

And we obtain the expected text :

This number is the right number of occurrences of the word "THREE"

This number is the wrong number of occurrences.

The number of lines, after replacement, was	about five times the initial number of lines but, as the lines were between 150 and 250 chars long, this THREE still seemed acceptable compared to the number of lines required to list all words of text, one per line. If we reduce the number of required characters to the range 50-150, before a full stop, we increase the number of new paragraphs !

number,number,THREE,number,number

numbernumberTHREEnumbernumbernumbernumber

Note that the regex, above, can still be improved, as below, which avoids to repeat the literal string number :

SEARCH (?-s)^((((?!(?4)).)*(?4)){2}.*?)(number)(.*)

REPLACE \1THREE\5

So, as a summary, if you use the generic regex :

SEARCH (?-s)^((((?!(?4)).)*(?4)){N}.*?)(OLD_WORD)(.*)

REPLACE \1NEW_WORD\5

and substitute the string OLD_WORD by any literal string and the string N, between the braces, by any integer nuimber,

This regex S/R will change, exclusively, the N+1 occurrence of OLD_WORD with NEW_WORD, in each line of current file

Remark : You may use the {0} syntax, in order to replace the 1st occurrence of OLD_WORD !

Best Regards,

guy038

P.S. :

Just forgot you’re a beginner ! So, to achieve all these different searches/replacements :

Open the N++ Replace dialog ( Ctrl + H )
Paste the SEARCH regex in the **Find what:**zone
Paste the REPLACE regex in the **Replace with:**zone
Check the Match case option, if you prefers a sensitive to case search
Preferably, check the Wrap around option, so all contents of file, from the very beginning, will be processed
Click, once, on the Replace All button or several times on the Replace button

Et voilà !

Meta Chuh

@coconut79

note: don’t forget to set your “search mode” to "regular expression at the bottom left of your find/replace window.

coconut79

@guy038 Thank you so much for taking the time to write this long post, I was able to follow it and do it in notepad++ and it works like a charm! @Meta-Chuh
I just have one additional question that I forgot to mention. Is it possible to add an additional instruction for a new line before a quotation mark - for example

This is a random sentence in my text. "I like this random sentence", said the reader.

would become

This is a random sentence in my text.
  "I like this random sentence", said the reader.

Also if it’s not too bothersome, could you explain what every element in those commands about #2 does so maybe I can play around with it? Thanks again for your help.

coconut79

@coconut79 Actually, looking at the results of running the original search&replace, it mostly takes care of that issue so it’s okay.

guy038

Hi @coconut79 and All,

Further on, I give you some hints on how these regexes work. But, sincerely, as a regex begninner, you should, first, have a look to this FAQ :

https://notepad-plus-plus.org/community/topic/15765/faq-desk-where-to-find-regex-documentation

I advice you to study the excellent tutorial of this site ( the reference ! ) :

http://www.regular-expressions.info

Depending of your time, you may take 2/3 months to fully assimilate most of the regex particularities. Luckily, once this first step completed, I can assure you that you will never go back, as modern regex engines, embedded in IDE and code editors, are really powerful ;-))

Now, some quick notes on the regexes of my previous post :

Regarding your #2 point :

SEARCH (?-s)(.{150,250}\.(\[\d+\])*)\h+

REPLACE \1\r\n\r\n

(?-s) is an on-line modifier which tells the regex engine that the dot meta-character matches only a single standard character and not EOL chars
The part \.(\[\d+\])*\h+ represent a literal dot character, followed by any list, even null, of [..] syntaxes, containing a number and, finally, followed with any amount of horizontal blank chars
The part .{150,250} represent any area of standard characters, containing between 150 and 250 characters, located before the full stop . and possible [##] forms
The external parentheses, without the blank chars, defines the group 1 which must be rewritten
In replacement, \1 represents group 1 and \r\n a line-break. Note that you just need \n in Unix files

Regarding the generic regex, relative to your #1 point :

SEARCH (?-s)^((((?!(?4)).)*(?4)){N}.*?)(OLD_WORD)(.*)

REPLACE \1NEW_WORD\5

The search regex is complicated enough, as :
- it contains sub-routine calls (?4) to group4, which is a regex notion, generally seen near the end of a regex tutorial !
- it contains a negative look-ahead, ((?!(?4)).)*, which assures you that possible areas of text, located before the expression OLD_WORD, did not contain, themselves, any OLD_WORD expression !

In short, this regex slices any current line, in 3 parts, as below :

............OLD_WORD........OLD_WORD......................OLD_WORD...........OLD_WORD..........................
\                                                                           /\       /\                       /
 ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯  ¯¯¯¯¯¯¯  ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
         N times "................OLD_WORD" + ".........." = group 1          group 4  Rest of line = group 5

And, of course, during replacement, we just rewrite group 1 first, followed with NEW_WORD, followed with group5, the rest of current line

Finally, regarding your last question, I do think that an individual S/R search for areas of text, between normal or smart double quote characters, and replacement if necessary, would be better. So, the search regex, below, looks for the zero-length location, right before any non-null area of text, between double quotes and writes a line-break, followed with two space characters, at this location :

SEARCH (?=["“].+?["”])

REPLACE \r\n\x20\x20

The (?=........) syntax ìs a positive look-ahead structure. That is to say, a condition which must be true in order to valid the overall match
["“] searches either a normal double quote or an opening smart double quote
["”] searches either a normal double quote or an ending smart double quote
.+? represents the smallest area of standard characters between quotes
As nothing precedes the look-around, it simply looks for a zero-length match
In replacement, it rewrites a line-break, first, \r\n, followed with two space chars \x20\x20, before the double-quotes area

Cheers,

guy038

coconut79

@guy038 Thank you again for taking the time to explain all of that in an accessible way :)
I will try to get a better understanding of the expressions and hopefully wrap my head around all of it. Thanks.