<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Match consecutive lines that start with the same word]]></title><description><![CDATA[<p dir="auto">I’m learning Regex after being inspired by Alan Kilborn who massively helped me out a couple of weeks ago with my first query.</p>
<p dir="auto">I am now trying to highlight header rows which are not followed by a data row so I can remove these from the data set.</p>
<p dir="auto">These all start with the same 3 letters (in this example AAA) and I want to highlight the rows where they are not followed by a data row which all have a consistent first 3 letters (in this example BBB).</p>
<p dir="auto">So, in this data I want to highlight and retain rows 1,2,6-11 and exclude 3-5 as these are not connected to a data row.  The number of consecutive AAA rows can vary but I always need the last one before a BBB row.</p>
<p dir="auto"><img src="/assets/uploads/files/1721057127816-row-match.jpg" alt="Row Match.jpg" class=" img-fluid img-markdown" /></p>
<p dir="auto">I have spent a lot of time Googling this and the closest I can find is:<br />
(?s)(\w+)\s+\w+\r\n(\1\s+\w+(?:\r\n)?)+</p>
<p dir="auto">I found this on stackoverview, unfortunately I can’t post the link as I’m a newbie.</p>
<p dir="auto">I’m struggling to edit this to work with my data set.  Any help is much appreciated!</p>
]]></description><link>https://community.notepad-plus-plus.org/topic/25947/match-consecutive-lines-that-start-with-the-same-word</link><generator>RSS for Node</generator><lastBuildDate>Wed, 22 Apr 2026 08:40:14 GMT</lastBuildDate><atom:link href="https://community.notepad-plus-plus.org/topic/25947.rss" rel="self" type="application/rss+xml"/><pubDate>Mon, 15 Jul 2024 15:26:08 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to Match consecutive lines that start with the same word on Tue, 16 Jul 2024 08:06:14 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/3841">@PeterJones</a> Thanks Peter<br />
Noted and thanks for the helpful links.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/95781</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/95781</guid><dc:creator><![CDATA[Ross Brown]]></dc:creator><pubDate>Tue, 16 Jul 2024 08:06:14 GMT</pubDate></item><item><title><![CDATA[Reply to Match consecutive lines that start with the same word on Tue, 16 Jul 2024 08:05:10 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/26710">@Mark-Olson</a> Thanks Mark<br />
It works perfectly and you have provided a really clear explanation.  I was going to bookmark and remove the rows but your additional code was a bonus.  I can follow the logic (helped by your clear explanation) it’s the groups I am getting stuck on.  I will do some more research in this area. Thanks again!</p>
]]></description><link>https://community.notepad-plus-plus.org/post/95780</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/95780</guid><dc:creator><![CDATA[Ross Brown]]></dc:creator><pubDate>Tue, 16 Jul 2024 08:05:10 GMT</pubDate></item><item><title><![CDATA[Reply to Match consecutive lines that start with the same word on Mon, 15 Jul 2024 16:05:03 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/31107">@Ross-Brown</a> ,</p>
<blockquote>
<p dir="auto">unfortunately I can’t post the link as I’m a newbie.</p>
</blockquote>
<p dir="auto">Asking questions in a way that makes it <em>easy</em> for us to help you would probably earn more upvotes.  (But since this was enough for Mark to figure out what you wanted, I gave another upvote.)</p>
<p dir="auto">But in the future, it would make it a lot easier for us to help you if you would give us your example data as <strong>text</strong> using the <code>&lt;/&gt;</code> button when you are creating your post, so we can copy/paste, rather than making us try to type the same thing we see in a screenshot.  That way it ends up in the code box with the “copy code” button, like in Mark’s reply.</p>
<p dir="auto">(I had started an answer that was similar to Mark’s, but he posted before I got very far, so I stopped that part of my reply, and didn’t include any specifics for your situation; he explained it better than I was doing.)</p>
<p dir="auto">----</p>
<h3>Useful References</h3>
<ul>
<li><a href="https://community.notepad-plus-plus.org/topic/21965/please-read-before-posting">Please Read Before Posting</a></li>
<li><a href="https://community.notepad-plus-plus.org/topic/22022/template-for-search-replace-questions">Template for Search/Replace Questions</a></li>
<li><a href="https://community.notepad-plus-plus.org/topic/21925/faq-desk-formatting-forum-posts">Formatting Forum Posts</a></li>
<li><a href="https://npp-user-manual.org/docs/searching/#regular-expressions" rel="nofollow ugc">Notepad++ Online User Manual: Searching/Regex</a></li>
<li><a href="https://community.notepad-plus-plus.org/topic/15765/faq-desk-where-to-find-regular-expressions-regex-documentation">FAQ: Where to find other regular expressions (regex) documentation</a></li>
</ul>
]]></description><link>https://community.notepad-plus-plus.org/post/95767</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/95767</guid><dc:creator><![CDATA[PeterJones]]></dc:creator><pubDate>Mon, 15 Jul 2024 16:05:03 GMT</pubDate></item><item><title><![CDATA[Reply to Match consecutive lines that start with the same word on Mon, 15 Jul 2024 16:01:09 GMT]]></title><description><![CDATA[<p dir="auto">Hi <a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/31107">@Ross-Brown</a></p>
<p dir="auto">Keep up the good work learning regular expressions!</p>
<p dir="auto">This is a task where <a href="https://npp-user-manual.org/docs/searching/#assertions" rel="nofollow ugc">lookahead and lookbehind</a> can be useful, because you want to <em>check whether the next line has some text, without moving forward to that line.</em></p>
<p dir="auto">I came up with the regular expression <code>(?-s)^(TEXT_TO_MATCH)(.*)(\R|\z)(?=\1)</code> to solve your problem, assuming you want to keep only lines that start with <code>TEXT_TO_MATCH</code> and <em>that are <strong>not</strong> followed by another line that starts with <code>TEXT_TO_MATCH</code></em>.</p>
<p dir="auto">This regular expression does the following:</p>
<ul>
<li><code>^(TEXT_TO_MATCH)</code> attempts to find <code>TEXT_TO_MATCH</code> at the beginning of a line, then stores it as the <strong>first</strong> capture group</li>
<li><code>(.*)</code> consumes the rest of the line (since <code>(?-s)</code> was specified at the start of the regex) and stores it as the <strong>second</strong> capture group</li>
<li><code>(\R|\z)(?=\1)</code> stores the line ending (<code>CRLF</code>, <code>CR</code>, or <code>LF</code>) as the <strong>third</strong> capture group, but then <strong>fails the match</strong> if it sees that the next line <em>does not start</em> with the <strong>first</strong> capture group.</li>
</ul>
<p dir="auto">For example, let’s say you wanted to clear lines (remove their text but leave them empty) if they start with <em>AAA or BBB followed by a normal space character</em> and the next line has the same beginning.</p>
<p dir="auto">Then you would replace <code>TEXT_TO_MATCH</code> in our original regex with <code>(?:AAA|BBB)\x20</code>, since that matches <em>AAA or BBB followed by a normal space character</em>, and we get the regex <code>(?-s)^((?:AAA|BBB)\x20)(.*)(\R|\z)(?=\1)</code></p>
<p dir="auto">We can test this out on this example:</p>
<pre><code>AAA A   [Header row 1]
BBB B   [Data row 1]
AAA A   [Header row 2]
AAA A   [Header row 3]
AAA A   [Header row 4]
AAA A   [Header row 5]
BBB B   [Data row 2]
AAA A   [Header row 6]
BBB B   [Data row 3]
AAA A   [Header row 7]
BBB B   [Data row 4]
BBB B   [Data row 5]
AAA A   [Header row 8]
</code></pre>
<p dir="auto">If we replace <code>(?-s)^((?:AAA|BBB)\x20)(.*)(\R|\z)(?=\1)</code> with <code>${3}</code>, we clear everything except the line ending from each matched line, and get:</p>
<pre><code>AAA A   [Header row 1]
BBB B   [Data row 1]



AAA A   [Header row 5]
BBB B   [Data row 2]
AAA A   [Header row 6]
BBB B   [Data row 3]
AAA A   [Header row 7]

BBB B   [Data row 5]
AAA A   [Header row 8]
</code></pre>
<p dir="auto"><strong>I hope that helped!</strong></p>
]]></description><link>https://community.notepad-plus-plus.org/post/95766</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/95766</guid><dc:creator><![CDATA[Mark Olson]]></dc:creator><pubDate>Mon, 15 Jul 2024 16:01:09 GMT</pubDate></item></channel></rss>