Line wrap in the current version
-
@PeterJones, Thanks for the answers and please accept my apologies. I assumed that this forum would notify me by email in case of a response. As for missing your ~26 day post. I apologize, but was inundated with so many software issues that I lost track.
I come from the TextPad universe and compare features used in TP. This missing feature in NP++ keeps me from making it my default text editor in Windows.
-
@PeterJones said in Line wrap in the current version:
willing to do a quick search/replace (regex) transformation on the text to do the column-word-wrap
@guy038 (CC’ing another RegEx guru for potential help on this question - not urgent).
I remembered this thread and made a very complex NppExec script to do it and then came back here to post it and saw this one-liner RegEx - boy do I feel silly :-(
One thing I noticed though is that to wrap a column 80 for example, my script finds column 80 and then backtracks to the first space it sees. Your RegEx would start at column 80 and go forward to the first space it sees. So my script wraps BEFORE column 80 , yours would wrap after it. Of course I could pick column 72 for example , but if one line happens to have a 10 character word starting at column 72 it would still wrap after 80.
Can your RegEx be modified to do a “look-ahead” or “look-behind” to start at column X (80 for example) and then backtrack to the first space and then insert the
\r\n
?Not to be nit-picky - just wondering if it can be done and how.? I know RegEx, but some of the users on this sight blow my mind!
Cheers.
-
@Michael-Vincent said in Line wrap in the current version:
just wondering if it can be done and how.?
I think this Replace might just do it.
Find What:^(?=.{80,})(.{1,79})\s
Replace With:$1\r\n
So the lookbehind prevents any line shorter than 80 char being broken up. The rest makes the capture end on a space at or just under the 80 char limit which is removed and replaced with the CRLF.
Terry
-
@Michael-Vincent said in Line wrap in the current version:
Can your RegEx be modified
At the time I was answering the previous question, I couldn’t think of a way, but I think this does it:
- FIND =
^.{1,80}\K\h+(?=\w)
So it greedily takes up to 80 characters, followed by one or more horizontal spaces; this should find the first space at or before the 80th char (so if the 80th char is a non-space, and 81st is a space, it still has 80 char per line).
I see that @Terry-R chimed in just before me with a lookbehind solution.
If I start with
123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456 89 12 45 789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456789x12345 78 x1 34 6789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456789x1234 67 9x 23 56789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456789x123 56 89 12 456789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456789x12 45 78 x1 3456789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456789x1 34 67 9x 23456789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456789x 23 56 89 123456789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456789 12 45 78 x123456789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x12345678 x1 34 67 9x123456789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x1234567 9x 23 56 89x123456789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456 89 12 45 789x123456789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x
Then my original, modified to 80, gives:
^.{80,}?\K\h+(?=\w)
123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456 89 12 45 789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456789x12345 78 x1 34 6789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456789x1234 67 9x 23 56789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456789x123 56 89 12 456789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456789x12 45 78 x1 3456789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456789x1 34 67 9x 23456789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456789x 23 56 89 123456789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456789 12 45 78 x123456789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x12345678 x1 34 67 9x123456789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x1234567 9x 23 56 89x123456789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456 89 12 45 789x123456789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x
^.{1,80}\K\h+(?=\w)
123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456 89 12 45 789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456789x12345 78 x1 34 6789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456789x1234 67 9x 23 56789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456789x123 56 89 12 456789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456789x12 45 78 x1 3456789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456789x1 34 67 9x 23456789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456789x 23 56 89 123456789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456789 12 45 78 x123456789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x12345678 x1 34 67 9x123456789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x1234567 9x 23 56 89x123456789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456 89 12 45 789x123456789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x
^(?=.{80,})(.{1,79})\s
123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456 89 12 45 789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456789x12345 78 x1 34 6789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456789x1234 67 9x 23 56789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456789x123 56 89 12 456789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456789x12 45 78 x1 3456789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456789x1 34 67 9x 23456789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456789x 23 56 89 123456789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456789 12 45 78 x123456789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x12345678 x1 34 67 9x123456789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x1234567 9x 23 56 89x123456789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456 89 12 45 789x123456789x123456789x 123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x
- FIND =
-
Actually, looking, mine has a mistake, because it breaks the lines twice… ahh, because of the
1,80
, it’s finding the first line-wrap, even on short lines. And that’s what Terry’s gives you: it only wraps lines that are at least 80 characters, whereas mine will wrap any line.or
^.{60,80}?\K\h+(?=\w)
would wrap any lines at least 60 characters long, at the rightmost space.But I think Terry’s best matches desired line-wrap before 80 only on long lines.
-
@PeterJones said in Line wrap in the current version:
it only wraps lines that are at least 80 characters
Yes, but ouch @PeterJones , throwing a curve ball at me when I wasn’t looking. Lines with NO spaces, I didn’t think of that one!
Terry
-
@Terry-R said in Line wrap in the current version:
throwing a curve ball at me
Sorry. I guess the last couple days, I’ve been trying to break people’s regexes too much.
But, really, not wrapping at all if there’s no space before char 80 is a reasonable thing to do, and that’s what yours does. There aren’t any real 80-character words in English where you would want to be line wrapping, anyway (you might be able to find a manufactured chemical name that is that long, or some such, but it wouldn’t be in text that you’re word-wrapping in a text editor, and/or you wouldn’t want it to split if you were otherwise word-wrapping).
-
@PeterJones said in Line wrap in the current version:
I’ve been trying
Though really what prompted the 100char unbroken line was wanting a “ruler” to keep me sane inside Notepad++ and when pasting into the forum. :-)
-
@PeterJones said in Line wrap in the current version:
There aren’t any real 80-character words in English where you would want to be line wrapping, anyway
Not so fast, what about “proper Names”, ah yes a (not so) subtle hint at another post on this forum!
The North Island of New Zealand has a place named Taumatawhakatangihangakoauauotamateaturipukakapikimaungahoronukupokaiwhenuakitanatahu.
The 1,000-foot hill near the township Porangahau holds the Guinness World Record for longest place name with 85 characters.I’m actually thinking that if a line didn’t have any spaces within the confined boundary (80, or whatever number is used) then the word should be hyphenated and a
-
inserted at the 80th (or whatever) character position.Terry
-
@Terry-R ,
Proper names don’t count – especially since I said “English”, and that NZ place name was not an English word. Also, I did say parenthetically “and/or you wouldn’t want it to split”. :-)
I would not recommend ever splitting a longer-than-80 word in an arbitrary location via regex – too much chance of an unintentional change-in-meaning. If a word had soft-hyphens or other Unicode character indicating “it’s okay to split here” (there are a variety of similar zero-width characters which would allow splitting without breaking up the visual word), then split/hyphenate on those, sure; but without those, I wouldn’t want to take responsibility for what the word might become.
-
@PeterJones said in Line wrap in the current version:
100char unbroken line was wanting a “ruler” to keep me sane inside Notepad++
Thank you both! Much more elegant than my insane looping and index keeping in NppExec script. I mostly like to “line wrap” with hard carriage returns at 80 columns in Readme Markdown documents, even though rendered in a viewer, it doesn’t matter. I figure if every
more
orless
from the command line, I still want to be legible. And words over 80 characters (think long URLs) should not wrap, that’s fine.Regarding ruler in Notepad++ …
Cheers.
-
@Terry-R said in Line wrap in the current version:
So the lookbehind prevents any…
I should rephrase that statement. This is actually a lookahead BEFORE the match. Just so when someone looks at this thread sometime in the future they don’t get confused.
I’ll even provide a link so readers can see how to identify the lookarounds for themselves:
http://rexegg.com/regex-disambiguation.html#lookaroundsSorry about that.
Terry
-
Maybe this is a bit much to ask, but maybe I’ll lay it down as a challenge to interested parties.
Often I take notes and tab sections over once or twice.
Thus text like this may result in my notes:
After a weekend of emotional honesty at an Esalen-style retreat, Los Angeles sophisticates Bob and Carol Sanders (Robert Culp and Natalie Wood) return home determined to embrace complete openness. They share their enthusiasm and excitement over their new-found philosophy with their more conservative friends Ted and Alice Henderson (Elliott Gould and Dyan Cannon), who remain doubtful. Soon after, filmmaker Bob has an affair with a young production assistant on a film shoot in San Francisco. When he gets home he admits hisliaison to Carol, describing the event as a purely physical act, not an emotional one.
I’d find it nice to be able to reformat that text to wrap at a certain column, e.g. 80, and yet keep the leading indentation. Something like this:
After a weekend of emotional honesty at an Esalen-style retreat, Los Angeles sophisticates Bob and Carol Sanders (Robert Culp and Natalie Wood) return home determined to embrace complete openness. They share their enthusiasm and excitement over their new-found philosophy with their more conservative friends Ted and Alice Henderson (Elliott Gould and Dyan Cannon), who remain doubtful. Soon after, filmmaker Bob has an affair with a young production assistant on a film shoot in San Francisco. When he gets home he admits hisliaison to Carol, describing the event as a purely physical act, not an emotional one.
Not sure I manually got the reformatted lines absolutely correct, but…you get the idea.
Ok, well, thinking about this a bit more, I guess it really is a bit too much to ask for. :-)
-
@Alan-Kilborn said in Line wrap in the current version:
Maybe this is a bit much to ask, but maybe I’ll lay it down as a challenge to interested parties.
Ok, well, thinking about this a bit more, I guess it really is a bit too much to ask for. :-)
Yea, I don’t have a solution for that one. My NppExec script starts by joining all the highlighted lines into a single line and then does the regex or my super complicated NppExec method do wrap by inserting the carriage returns (based on the file EOL type).
My script follows if it will at all help or give you some ideas to start with. I call it
wrap
and so just need to type\wrap help
from the NppExec console to get a hint:::wrap NPP_CONSOLE keep // Defaults SET LOCAL WRAP = 80 SET LOCAL REGEX = 0 // command line arguments IF "$(ARGC)"<="1" THEN // get the edge column marker if present SCI_SENDMSG SCI_GETEDGECOLUMN IF $(MSG_RESULT)>0 THEN SET LOCAL WRAP = $(MSG_RESULT) ENDIF ELSE IF "$(ARGC)">="2" THEN IF "$(ARGV[1])"~="help" THEN GOTO USAGE ELSE IF "$(ARGV[1])"~="--regex" THEN SET LOCAL REGEX = 1 IF "$(ARGC)">="3" THEN SET LOCAL WRAP = $(ARGV[2]) ENDIF ELSE SET LOCAL WRAP = $(ARGV[1]) ENDIF ELSE GOTO USAGE ENDIF SET LOCAL WRAPL ~ $(WRAP) - 1 // setup the carriage return / line feed based on current buffer line ending type SET LOCAL CRLF ~ strfromhex 0d 00 0a 00 SET LOCAL OFFSET = 2 SCI_SENDMSG SCI_GETEOLMODE IF $(MSG_RESULT)==1 THEN SET LOCAL CRLF ~ strfromhex 0d 00 SET LOCAL OFFSET = 1 ELSE IF $(MSG_RESULT)==2 THEN SET LOCAL CRLF ~ strfromhex 0a 00 SET LOCAL OFFSET = 1 ENDIF // get start and end of selection and bail out if selection is less than the desired wrap SCI_SENDMSG SCI_GETSELECTIONSTART SET LOCAL START = $(MSG_RESULT) SCI_SENDMSG SCI_GETSELECTIONEND SET LOCAL END = $(MSG_RESULT) SET LOCAL TEST ~ $(START) + $(WRAP) IF $(TEST)>=$(END) GOTO END // join all highlighted lines to a single big long line to start the parsing SCI_SENDMSG SCI_SETTARGETSTART $(START) SCI_SENDMSG SCI_SETTARGETEND $(END) SCI_SENDMSG SCI_LINESJOIN // Reset END after joining lines SCI_SENDMSG SCI_GETSELECTIONEND SET LOCAL END = $(MSG_RESULT) // super elegant way to do it all with a regex IF "$(REGEX)"=="1" THEN // https://community.notepad-plus-plus.org/topic/20008/line-wrap-in-the-current-version/6 ECHO REGEX = $(WRAP) SCI_REPLACE NPE_SF_INSELECTION|NPE_SF_REPLACEALL|NPE_SF_REGEXP "^(?=.{$(WRAP),})(.{1,$(WRAPL)})\s" "$1$(CRLF)" GOTO DONE ENDIF // super kludge-y way to do it all with NppExec scripting SET LOCAL LOOP = 1 SET LOCAL BACK = 0 :LOOP SET LOCAL POS ~ $(START) + $(WRAP) * $(LOOP) + ( $(OFFSET) * ( $(LOOP) - 1 ) ) - $(BACK) - 1 // ECHO START: $(POS) ( END = $(END) BACK = $(BACK) ) IF $(POS)>=$(END) THEN GOTO DONE ENDIF :INNERLOOP SCI_SENDMSG SCI_GETCHARAT $(POS) IF "$(MSG_RESULT)"!="32" THEN SET LOCAL POS ~ $(POS) - 1 SET LOCAL BACK ~ $(BACK) + 1 // ECHO Backtracking: $(POS) GOTO INNERLOOP ENDIF SET LOCAL POS ~ $(POS) + 1 SCI_SENDMSG SCI_INSERTTEXT $(POS) "$(CRLF)" SET LOCAL END ~ $(END) + $(OFFSET) // ECHO Inserting: $(POS) ( new END = $(END) ) SET LOCAL LOOP ~ $(LOOP) + 1 GOTO LOOP // either method finishes here and sets cursor to start of new wrapped text :DONE SCI_SENDMSG SCI_GOTOPOS $(START) // ECHO END $(END) GOTO END :USAGE ECHO Usage: ECHO Word-wrap by carriage returns selected text into one paragraph. ECHO \$(ARGV[0]) [W] = wrap selected text to EDGE marker, 80 (default) or W ECHO \$(ARGV[0]) [--regex [W]] = Use RegEx implementation with SCI_REPLACE :END
Cheers.
-
Thanks.
It could be a job for a PythonScript, but I’ve never gotten around to finishing that one. Other priorities, I guess. :-) -
@Alan-Kilborn said in Line wrap in the current version:
and yet keep the leading indentation.
I can’t (currently) see a single regex doing this in one pass. The issue is not so much grabbing the leading tabs or spaces on the first line, but when they are “copied” to the next line, now the current position of the regex engine is past that point. Yet those spaces or tabs must count towards the line length.
To make matters worse a tab is defined as a set number of positions (according to NPP preferences) yet isn’t it just 1 character as per the regex engine? So to attempt to say 80 characters wide now becomes an issue, 1 or more might be a “variable” width tab.
More pondering required!
Terry
-
@Terry-R said in Line wrap in the current version:
To make matters worse a tab is defined as a set number of positions (according to NPP preferences) yet isn’t it just 1 character as per the regex engine?
Sane people have the N++ option to replace any tab hits with a certain amount of spaces, not an actual tab character. I’m not so hung up on the count of those spaces, but I use (and showed in the example above), 4.
SIDE NOTE: What happens if you attempt to put tab characters in a code block on this site?
Let’s try:
nothing at start of this line one tab at start two tabs at start nothing at start of this line
Edit: It keeps the tab characters intact!
-
@Alan-Kilborn said in Line wrap in the current version:
It keeps the tab characters intact!
And for me (since I don’t convert to spaces) it will dynamically apply the number that is currently showing (but not ticked). I copied your code, it kept the tabs. When I changed the space from 4 to 3 it moved the blocks but kept the tab character because I can set the number BUT not tick (select) it to convert to spaces…
Terry
-
@Alan-Kilborn said in Line wrap in the current version:
Maybe this is a bit much to ask, but maybe I’ll lay it down as a challenge to interested parties.
Challenge accepted. It’s a bit rough around the edges but seems workable. As I suggested, I did NOT manage to do it in 1 regex, rather it will be 2 regexes followed by an “empty line” elimination step.
My first step is to add the “indentation” to a following line. This regex checks that if that “following” line is currently “empty” (only spaces/tabs) then it will NOT create any more. This 1st regex runs ONCE! Then the second step cuts each line at the prescribed column and “appends” it to the following “empty” line and then adds another further “empty” line. This regex needs running until no more changes occur. The 3rd step is to remove blank lines through the “Line Operations” function.
As I say it’s a bit rough, but thought it might be interesting for someone to pickup on and see if it can be tweaked further (note I did use
\t
, that probably needs changing to ALL characters that might exist forming part of the indentation), or that it might give food for thought in a different direction. Since the 1st regex can be run multiple times without any problems the 2 could possibly be combined into a macro which is run UNTIL no changes occur.-
Find What:
(?-s)^([\t ]++)(?!$)(.+)(\R)(?!\1\3)
Replace With:\1\2\3\1\3
this step ONLY needs running once but will not cause any problem if run more than once. -
Find What:
(?-s)^(?=.{80,})(.{1,79})\s(.+\R)([\t ]++$)
Replace With:$1\r\n$3$2$3
this step needs running until no more changes occur. -
“Line Operations”, "Remove Empty Lines (containing blank characters).
Now one issue I did see is that (in my case) the tab character is taking up several positions, but to the regex it’s ONLY 1, the actual final line width can be slightly over the 80 characters visually. So a line with 2 tabs could be over by 4 character positions if the tab to space in Preferences, Language is set to 3, but with it NOT ticked to convert.
I think that would be a minor irritation.
Terry
-
-
@Terry-R said in Line wrap in the current version:
Find What:(?-s)^(?=.{80,})(.{1,79})\s(.+\R)([\t ]++$)
Replace With:$1\r\n$3$2$3 this step needs running until no more changes occur.Step 2 fails on “non-indented” lines. That was possibly also the result with my initial testing but I didn’t notice it at that point. I’ve just completed some more testing, this time using spaces as the indentation and for indented lines using either tab or space the solution works. Now to fix the non-indented lines.
I don’t portray the above steps as a finished/polished solution, rather a work in progress.
A revised step 2 Find What:
(?-s)^(?=.{80,})(.{1,79})\s(.+\R)([\t ]++$)?
Terry