Line wrap in the current version

I neuw

@PeterJones, Thanks for the answers and please accept my apologies. I assumed that this forum would notify me by email in case of a response. As for missing your ~26 day post. I apologize, but was inundated with so many software issues that I lost track.

I come from the TextPad universe and compare features used in TP. This missing feature in NP++ keeps me from making it my default text editor in Windows.

Michael Vincent

@PeterJones said in Line wrap in the current version:

willing to do a quick search/replace (regex) transformation on the text to do the column-word-wrap

@guy038 (CC’ing another RegEx guru for potential help on this question - not urgent).

I remembered this thread and made a very complex NppExec script to do it and then came back here to post it and saw this one-liner RegEx - boy do I feel silly :-(

One thing I noticed though is that to wrap a column 80 for example, my script finds column 80 and then backtracks to the first space it sees. Your RegEx would start at column 80 and go forward to the first space it sees. So my script wraps BEFORE column 80 , yours would wrap after it. Of course I could pick column 72 for example , but if one line happens to have a 10 character word starting at column 72 it would still wrap after 80.

Can your RegEx be modified to do a “look-ahead” or “look-behind” to start at column X (80 for example) and then backtrack to the first space and then insert the \r\n?

Not to be nit-picky - just wondering if it can be done and how.? I know RegEx, but some of the users on this sight blow my mind!

Cheers.

Terry R

@Michael-Vincent said in Line wrap in the current version:

just wondering if it can be done and how.?

I think this Replace might just do it.
Find What:^(?=.{80,})(.{1,79})\s
Replace With:$1\r\n

So the lookbehind prevents any line shorter than 80 char being broken up. The rest makes the capture end on a space at or just under the 80 char limit which is removed and replaced with the CRLF.

Terry

PeterJones

@Michael-Vincent said in Line wrap in the current version:

Can your RegEx be modified

At the time I was answering the previous question, I couldn’t think of a way, but I think this does it:

FIND = ^.{1,80}\K\h+(?=\w)

So it greedily takes up to 80 characters, followed by one or more horizontal spaces; this should find the first space at or before the 80th char (so if the 80th char is a non-space, and 81st is a space, it still has 80 char per line).

I see that @Terry-R chimed in just before me with a lookbehind solution.

If I start with

123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456 89 12 45 789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456789x12345 78 x1 34 6789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456789x1234 67 9x 23 56789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456789x123 56 89 12 456789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456789x12 45 78 x1 3456789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456789x1 34 67 9x 23456789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456789x 23 56 89 123456789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456789 12 45 78 x123456789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x12345678 x1 34 67 9x123456789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x1234567 9x 23 56 89x123456789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456 89 12 45 789x123456789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x

Then my original, modified to 80, gives:

`^.{80,}?\K\h+(?=\w)`

123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456 89 12
45 789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456789x12345 78 x1
34 6789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456789x1234 67 9x
23 56789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456789x123 56 89 12
456789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456789x12 45 78 x1
3456789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456789x1 34 67 9x
23456789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456789x 23 56 89 123456789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456789 12 45 78 x123456789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x12345678 x1 34 67 9x123456789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x1234567 9x 23 56 89x123456789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456 89 12 45 789x123456789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x

`^.{1,80}\K\h+(?=\w)`

123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456 89
12 45
789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456789x12345 78
x1 34
6789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456789x1234 67 9x
23
56789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456789x123 56 89
12
456789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456789x12 45 78
x1
3456789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456789x1 34 67 9x
23456789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456789x 23 56 89
123456789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456789 12 45 78
x123456789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x12345678 x1 34 67
9x123456789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x1234567 9x 23 56
89x123456789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456 89 12 45
789x123456789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x

`^(?=.{80,})(.{1,79})\s`

123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456 89
12 45 789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456789x12345 78
x1 34 6789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456789x1234 67
9x 23 56789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456789x123 56 89
12 456789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456789x12 45 78
x1 3456789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456789x1 34 67
9x 23456789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456789x 23 56 89
123456789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456789 12 45 78
x123456789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x12345678 x1 34 67
9x123456789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x1234567 9x 23 56
89x123456789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456 89 12 45
789x123456789x123456789x
123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x123456789x

PeterJones

Actually, looking, mine has a mistake, because it breaks the lines twice… ahh, because of the 1,80, it’s finding the first line-wrap, even on short lines. And that’s what Terry’s gives you: it only wraps lines that are at least 80 characters, whereas mine will wrap any line.

or ^.{60,80}?\K\h+(?=\w) would wrap any lines at least 60 characters long, at the rightmost space.

But I think Terry’s best matches desired line-wrap before 80 only on long lines.

Terry R

@PeterJones said in Line wrap in the current version:

it only wraps lines that are at least 80 characters

Yes, but ouch @PeterJones , throwing a curve ball at me when I wasn’t looking. Lines with NO spaces, I didn’t think of that one!

Terry

PeterJones

@Terry-R said in Line wrap in the current version:

throwing a curve ball at me

Sorry. I guess the last couple days, I’ve been trying to break people’s regexes too much.

But, really, not wrapping at all if there’s no space before char 80 is a reasonable thing to do, and that’s what yours does. There aren’t any real 80-character words in English where you would want to be line wrapping, anyway (you might be able to find a manufactured chemical name that is that long, or some such, but it wouldn’t be in text that you’re word-wrapping in a text editor, and/or you wouldn’t want it to split if you were otherwise word-wrapping).

PeterJones

@PeterJones said in Line wrap in the current version:

I’ve been trying

Though really what prompted the 100char unbroken line was wanting a “ruler” to keep me sane inside Notepad++ and when pasting into the forum. :-)

Terry R

@PeterJones said in Line wrap in the current version:

There aren’t any real 80-character words in English where you would want to be line wrapping, anyway

Not so fast, what about “proper Names”, ah yes a (not so) subtle hint at another post on this forum!
The North Island of New Zealand has a place named Taumatawhakatangihangakoauauotamateaturipukakapikimaungahoronukupokaiwhenuakitanatahu.
The 1,000-foot hill near the township Porangahau holds the Guinness World Record for longest place name with 85 characters.

I’m actually thinking that if a line didn’t have any spaces within the confined boundary (80, or whatever number is used) then the word should be hyphenated and a - inserted at the 80th (or whatever) character position.

Terry

PeterJones

@Terry-R ,

Proper names don’t count – especially since I said “English”, and that NZ place name was not an English word. Also, I did say parenthetically “and/or you wouldn’t want it to split”. :-)

I would not recommend ever splitting a longer-than-80 word in an arbitrary location via regex – too much chance of an unintentional change-in-meaning. If a word had soft-hyphens or other Unicode character indicating “it’s okay to split here” (there are a variety of similar zero-width characters which would allow splitting without breaking up the visual word), then split/hyphenate on those, sure; but without those, I wouldn’t want to take responsibility for what the word might become.

Michael Vincent

@PeterJones said in Line wrap in the current version:

100char unbroken line was wanting a “ruler” to keep me sane inside Notepad++

@Terry-R

Thank you both! Much more elegant than my insane looping and index keeping in NppExec script. I mostly like to “line wrap” with hard carriage returns at 80 columns in Readme Markdown documents, even though rendered in a viewer, it doesn’t matter. I figure if every more or less from the command line, I still want to be legible. And words over 80 characters (think long URLs) should not wrap, that’s fine.

Regarding ruler in Notepad++ …

Cheers.

Terry R

@Terry-R said in Line wrap in the current version:

So the lookbehind prevents any…

I should rephrase that statement. This is actually a lookahead BEFORE the match. Just so when someone looks at this thread sometime in the future they don’t get confused.
I’ll even provide a link so readers can see how to identify the lookarounds for themselves:
http://rexegg.com/regex-disambiguation.html#lookarounds

Sorry about that.

Terry

Alan Kilborn

Maybe this is a bit much to ask, but maybe I’ll lay it down as a challenge to interested parties.

Often I take notes and tab sections over once or twice.

Thus text like this may result in my notes:

After a weekend of emotional honesty at an Esalen-style retreat, Los Angeles sophisticates Bob
and Carol Sanders (Robert Culp and Natalie Wood) return home determined to embrace complete openness.
    They share their enthusiasm and excitement over their new-found philosophy with their more
    conservative friends Ted and Alice Henderson (Elliott Gould and Dyan Cannon), who remain doubtful.
        Soon after, filmmaker Bob has an affair with a young production assistant on a film shoot in
        San Francisco. When he gets home he admits hisliaison to Carol, describing the event as a purely
        physical act, not an emotional one.

I’d find it nice to be able to reformat that text to wrap at a certain column, e.g. 80, and yet keep the leading indentation. Something like this:

After a weekend of emotional honesty at an Esalen-style retreat, Los
Angeles sophisticates Bob and Carol Sanders (Robert Culp and Natalie Wood)
return home determined to embrace complete openness.
    They share their enthusiasm and excitement over their new-found
    philosophy with their more conservative friends Ted and Alice Henderson
    (Elliott Gould and Dyan Cannon), who remain doubtful. Soon after,
    filmmaker Bob has an affair with a young production assistant on a
    film shoot in
        San Francisco. When he gets home he admits hisliaison to Carol,
        describing the event as a purely physical act, not an emotional
        one.

Not sure I manually got the reformatted lines absolutely correct, but…you get the idea.

Ok, well, thinking about this a bit more, I guess it really is a bit too much to ask for. :-)

Michael Vincent

@Alan-Kilborn said in Line wrap in the current version:

Maybe this is a bit much to ask, but maybe I’ll lay it down as a challenge to interested parties.

Ok, well, thinking about this a bit more, I guess it really is a bit too much to ask for. :-)

Yea, I don’t have a solution for that one. My NppExec script starts by joining all the highlighted lines into a single line and then does the regex or my super complicated NppExec method do wrap by inserting the carriage returns (based on the file EOL type).

My script follows if it will at all help or give you some ideas to start with. I call it wrap and so just need to type \wrap help from the NppExec console to get a hint:

::wrap
NPP_CONSOLE keep

// Defaults
SET LOCAL WRAP = 80
SET LOCAL REGEX = 0

// command line arguments
IF "$(ARGC)"<="1" THEN
// get the edge column marker if present
    SCI_SENDMSG SCI_GETEDGECOLUMN
    IF $(MSG_RESULT)>0 THEN
        SET LOCAL WRAP = $(MSG_RESULT)
    ENDIF
ELSE IF "$(ARGC)">="2" THEN
    IF "$(ARGV[1])"~="help" THEN
        GOTO USAGE
    ELSE IF "$(ARGV[1])"~="--regex" THEN
        SET LOCAL REGEX = 1
        IF "$(ARGC)">="3" THEN
            SET LOCAL WRAP = $(ARGV[2])
        ENDIF
    ELSE
        SET LOCAL WRAP = $(ARGV[1])
    ENDIF
ELSE
    GOTO USAGE
ENDIF

SET LOCAL WRAPL ~ $(WRAP) - 1

// setup the carriage return / line feed based on current buffer line ending type
SET LOCAL CRLF ~ strfromhex 0d 00 0a 00
SET LOCAL OFFSET = 2
SCI_SENDMSG SCI_GETEOLMODE
IF $(MSG_RESULT)==1 THEN
    SET LOCAL CRLF ~ strfromhex 0d 00
    SET LOCAL OFFSET = 1
ELSE IF $(MSG_RESULT)==2 THEN
    SET LOCAL CRLF ~ strfromhex 0a 00
    SET LOCAL OFFSET = 1
ENDIF

// get start and end of selection and bail out if selection is less than the desired wrap
SCI_SENDMSG SCI_GETSELECTIONSTART
SET LOCAL START = $(MSG_RESULT)
SCI_SENDMSG SCI_GETSELECTIONEND
SET LOCAL END = $(MSG_RESULT)

SET LOCAL TEST ~ $(START) + $(WRAP)
IF $(TEST)>=$(END) GOTO END

// join all highlighted lines to a single big long line to start the parsing
SCI_SENDMSG SCI_SETTARGETSTART $(START)
SCI_SENDMSG SCI_SETTARGETEND $(END)
SCI_SENDMSG SCI_LINESJOIN

// Reset END after joining lines
SCI_SENDMSG SCI_GETSELECTIONEND
SET LOCAL END = $(MSG_RESULT)

// super elegant way to do it all with a regex
IF "$(REGEX)"=="1" THEN
    // https://community.notepad-plus-plus.org/topic/20008/line-wrap-in-the-current-version/6
    ECHO REGEX = $(WRAP)
    SCI_REPLACE NPE_SF_INSELECTION|NPE_SF_REPLACEALL|NPE_SF_REGEXP "^(?=.{$(WRAP),})(.{1,$(WRAPL)})\s" "$1$(CRLF)"
    GOTO DONE
ENDIF

// super kludge-y way to do it all with NppExec scripting
SET LOCAL LOOP = 1
SET LOCAL BACK = 0
:LOOP
SET LOCAL POS ~ $(START) + $(WRAP) * $(LOOP) + ( $(OFFSET) * ( $(LOOP) - 1 ) ) - $(BACK) - 1
// ECHO START: $(POS) ( END = $(END) BACK = $(BACK) )
IF $(POS)>=$(END) THEN
    GOTO DONE
ENDIF
:INNERLOOP
SCI_SENDMSG SCI_GETCHARAT $(POS)
IF "$(MSG_RESULT)"!="32" THEN
    SET LOCAL POS ~ $(POS) - 1
    SET LOCAL BACK ~ $(BACK) + 1
    // ECHO Backtracking: $(POS)
    GOTO INNERLOOP
ENDIF
SET LOCAL POS ~ $(POS) + 1
SCI_SENDMSG SCI_INSERTTEXT $(POS) "$(CRLF)"
SET LOCAL END ~ $(END) + $(OFFSET)
// ECHO Inserting: $(POS) ( new END = $(END) )
SET LOCAL LOOP ~ $(LOOP) + 1
GOTO LOOP

// either method finishes here and sets cursor to start of new wrapped text
:DONE
SCI_SENDMSG SCI_GOTOPOS $(START)
// ECHO END $(END)
GOTO END

:USAGE
ECHO Usage:
ECHO Word-wrap by carriage returns selected text into one paragraph.
ECHO   \$(ARGV[0]) [W]           = wrap selected text to EDGE marker, 80 (default) or W
ECHO   \$(ARGV[0]) [--regex [W]] = Use RegEx implementation with SCI_REPLACE

:END

Cheers.

Alan Kilborn

@Michael-Vincent

Thanks.
It could be a job for a PythonScript, but I’ve never gotten around to finishing that one. Other priorities, I guess. :-)

Terry R

@Alan-Kilborn said in Line wrap in the current version:

and yet keep the leading indentation.

I can’t (currently) see a single regex doing this in one pass. The issue is not so much grabbing the leading tabs or spaces on the first line, but when they are “copied” to the next line, now the current position of the regex engine is past that point. Yet those spaces or tabs must count towards the line length.

To make matters worse a tab is defined as a set number of positions (according to NPP preferences) yet isn’t it just 1 character as per the regex engine? So to attempt to say 80 characters wide now becomes an issue, 1 or more might be a “variable” width tab.

More pondering required!

Terry

Alan Kilborn

@Terry-R said in Line wrap in the current version:

To make matters worse a tab is defined as a set number of positions (according to NPP preferences) yet isn’t it just 1 character as per the regex engine?

Sane people have the N++ option to replace any tab hits with a certain amount of spaces, not an actual tab character. I’m not so hung up on the count of those spaces, but I use (and showed in the example above), 4.

SIDE NOTE: What happens if you attempt to put tab characters in a code block on this site?

Let’s try:

nothing at start of this line
	one tab at start
		two tabs at start
nothing at start of this line

Edit: It keeps the tab characters intact!

Terry R

@Alan-Kilborn said in Line wrap in the current version:

It keeps the tab characters intact!

And for me (since I don’t convert to spaces) it will dynamically apply the number that is currently showing (but not ticked). I copied your code, it kept the tabs. When I changed the space from 4 to 3 it moved the blocks but kept the tab character because I can set the number BUT not tick (select) it to convert to spaces…

Terry

Terry R

@Alan-Kilborn said in Line wrap in the current version:

Maybe this is a bit much to ask, but maybe I’ll lay it down as a challenge to interested parties.

Challenge accepted. It’s a bit rough around the edges but seems workable. As I suggested, I did NOT manage to do it in 1 regex, rather it will be 2 regexes followed by an “empty line” elimination step.

My first step is to add the “indentation” to a following line. This regex checks that if that “following” line is currently “empty” (only spaces/tabs) then it will NOT create any more. This 1st regex runs ONCE! Then the second step cuts each line at the prescribed column and “appends” it to the following “empty” line and then adds another further “empty” line. This regex needs running until no more changes occur. The 3rd step is to remove blank lines through the “Line Operations” function.

As I say it’s a bit rough, but thought it might be interesting for someone to pickup on and see if it can be tweaked further (note I did use \t, that probably needs changing to ALL characters that might exist forming part of the indentation), or that it might give food for thought in a different direction. Since the 1st regex can be run multiple times without any problems the 2 could possibly be combined into a macro which is run UNTIL no changes occur.

Find What:(?-s)^([\t ]++)(?!$)(.+)(\R)(?!\1\3)
Replace With:\1\2\3\1\3 this step ONLY needs running once but will not cause any problem if run more than once.
Find What:(?-s)^(?=.{80,})(.{1,79})\s(.+\R)([\t ]++$)
Replace With:$1\r\n$3$2$3 this step needs running until no more changes occur.
“Line Operations”, "Remove Empty Lines (containing blank characters).

Now one issue I did see is that (in my case) the tab character is taking up several positions, but to the regex it’s ONLY 1, the actual final line width can be slightly over the 80 characters visually. So a line with 2 tabs could be over by 4 character positions if the tab to space in Preferences, Language is set to 3, but with it NOT ticked to convert.

I think that would be a minor irritation.

Terry

Terry R

@Terry-R said in Line wrap in the current version:

Find What:(?-s)^(?=.{80,})(.{1,79})\s(.+\R)([\t ]++$)
Replace With:$1\r\n$3$2$3 this step needs running until no more changes occur.

Step 2 fails on “non-indented” lines. That was possibly also the result with my initial testing but I didn’t notice it at that point. I’ve just completed some more testing, this time using spaces as the indentation and for indented lines using either tab or space the solution works. Now to fix the non-indented lines.

I don’t portray the above steps as a finished/polished solution, rather a work in progress.

A revised step 2 Find What:(?-s)^(?=.{80,})(.{1,79})\s(.+\R)([\t ]++$)?

Terry