Q: How to sort groups of lines alphabetically according to the first word after the beginning of each group
-
I am searching for a function in notepad++ that lets me sort groups of lines in this way:
- The group of lines ist between two regular words for example <article> and </article>
After <article> I have a variable word * for example* human
In the next group I have another word for example animal
I want to sort the groups of lines alphabetically according to the word after <article>
Example:
<article>human
Content
Of
Many
Lines
</article>
<article>animal
Content
Of
Another
Many
Lines
</article>
<article>thing
Content
Of
Other
Many
Lines
</article>I want to sort them alphabetically in this way:
<article>animal
Content
Of
Another
Many
Lines
</article>
<article>human
Content
Of
Many
Lines
</article>
<article>thing
Content
Of
Other
Many
Lines
</article>
,
I hope that you help me
Thanks in advance - The group of lines ist between two regular words for example <article> and </article>
-
Step 0: Assume no smiley faces (☺ U+263A) in the data set; assumes windows newlines (CRLF =
\r\n
)
Step 1: Find\R(?!<article>)
(any newline not followed by the next starting<article>
), replace with\x{263a}
(smiley face), regular expression mode
Step 2: Edit > Line Operations > Sort Lines Lexicographically Ascending
Step 3: Find\x{263a}
, replace\r\n
, regular expression -
By way of explanation: I chose a smiley face as a “record separator” that was unlikely to occur in your data. If it does, pick any other unicode character you are sure isn’t in your data. I was originally going to replace with a space, but I realized that, even though your data doesn’t show any of the lines inside an
<article>...</article>
having a space, with how artificial your data was, I assumed you may have oversimplified the example, and didn’t want to risk you coming back and saying “but I had a space”. Note that assumptions like these are good reasons for providing sufficient data to tell what you really want, and what are allowed and not allowed situations.-----
FYI:This forum is formatted using Markdown , with a help link buried on the little grey
?
in the COMPOSE window/pane when writing your post. For more about how to use Markdown in this forum, please see @Scott-Sumner’s post in the “how to markdown code on this forum” topic , and my updates near the end . It is very important that you use these formatting tips – using single backtick marks around small snippets, and using code-quoting for pasting multiple lines from your example data files – because otherwise, the forum will change normal quotes (""
) to curly “smart” quotes (“”
), will change hyphens to dashes, will sometimes hide asterisks (or if your text isc:\folder\*.txt
, it will show up asc:\folder*.txt
, missing the backslash). If you want to clearly communicate your text data to us, you need to properly format it.
If you have further search-and-replace (“matching”, “marking”, “bookmarking”, regular expression, “regex”) needs, study this FAQ and the documentation it points to. Before asking a new regex question, understand that for future requests, many of us will expect you to show what data you have (exactly), what data you want (exactly), what regex you already tried (to show that you’re showing effort), why you thought that regex would work (to prove it wasn’t just something randomly typed), and what data you’re getting with an explanation of why that result is wrong. When you show that effort, you’ll see us bend over backward to get things working for you. If you need help formatting, see the paragraph above.
Please note that for all regex and related queries, it is best if you are explicit about what needs to match, and what shouldn’t match, and have multiple examples of both in your example dataset. Often, what shouldn’t match helps define the regular expression as much or more than what should match. -
Thanks very much.
It worked fine.
I have put
\R(?!(<article>(.+?)</article>))
Then the same methodBest wishes