Hello kreien,
I was pretty sure that your sorted list could be modified with a regex S/R. Unfortunately, I was unable to perform what you want to, in one go :-(( Luckily, with two successive S/R, it’s quite OK !
First of all, add a dummy line, just before the first line of your list ( xxxx[TAB]xxxx )
We’ll also need a dummy character, not used yet, in your file, to identify specific lines. I chose the # symbol but any other symbol may be used. Just escape it if this symbol is a special regex character !
Two hypotheses :
I supposed that each line of your sorted list are NOT preceded by some blank characters, which could be different, between two consecutive lines !
I supposed that you don’t care about the case of the text, before the first tabulation character
So, we start, for instance, with the sorted example text, below :
xxxx xxxx
Garden following text
garden following text
Garden following text
Garden following text
Green following text
House following text
House following text
house following text
street following text
Street following text
Wall following text
As you can see, the lines, beginning with House, are located after those beginning with the word Green. Better for a sorted list, isn’t it ?
The first regex S/R, below, will add a # symbol at the end of, either , any single line and OR the last line of a group
SEARCH (?i-s)^(.+?)\t.+\K\R(?!\1)
REPLACE #$0
NOTES :
The part (?i-s) forces the regex engine to consider the dot character, ., as a single standard character, only and that all the process is done, in an insensitive way !
Then, the part ^(.+?)\t represents, from beginning of line, the shortest range of standard characters, followed by a tabulation character. This range is stored as group 1, due to the surrounding round brackets
The part .+, matches all the remaining standard characters, of the line, after the first tabulation
The final part \R(?!\1) represents the End of Line character(s) of the current line, followed by a negative look-ahead, that is to say a condition which must be true for the regex engine considers the overall match. So, the beginning of the next line must be different from the beginning of the previous one ( \1 )
Finally, the syntax \K forces the regex engine to forget all text matched, before \K. So, this search regex just matches the End of line character(s) of the current line, if next line does NOT begin with the same string beginning the current one
So, in
replacement, these
End of Line character(s) ( the
whole regex
$0 ) are re-written,
preceded by a
# symbol
And we obtain the changed text, below :
xxxx xxxx#
Garden following text
garden following text
Garden following text
Garden following text#
Green following text#
House following text
House following text
house following text#
street following text
Street following text#
Wall following text#
The second regex S/R, below, deletes any # symbol, as well as any text, till the first tabulation character, in all the lines whose the previous line does NOT end with a # symbol
SEARCH (?-s)#|[^#\r\n]\R\K.+?(?=\t)
REPLACE EMPTY
NOTES :
Refer above, for the (?-s) syntax
The first part of the alternative, |, matches a possible # symbol, at the end of a line
The second part of the alternative, [^#\r\n]\R, looks for a last standard character, different from a # symbol, followed by the End of Line character(s)
Then the \K syntax, again, reset the regex engine search location, at beginning of the next line
Finally, the part .+?(?=\t) just matches the shortest range of characters, which is followed by the first tabulation character, of the next line
In replacement, either, the # symbol OR all the characters, before the first tabulation, when the previous line does NOT end with a # symbol, are, simply, deleted
So, we get the final text :
xxxx xxxx
Garden following text
following text
following text
following text
Green following text
House following text
following text
following text
street following text
following text
Wall following text
To end with, delete the dummy first line. Et voilà !
IMPORTANT :
As we use the \K syntax, in the two S/R, you must click on the Replace All button, exclusively ! Don’t use the Replace button, ( step by step replacement ) for these S/R !
Best Regards,
guy038