<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[About single and duplicate lines...]]></title><description><![CDATA[<p dir="auto">Hello, <strong>All</strong>,</p>
<p dir="auto">Reading that <a href="https://community.notepad-plus-plus.org/topic/27467">post</a> made me realize that searching for <strong>single</strong> or <strong>duplicate</strong> lines is a very <strong>common</strong> task. Some time ago, for my <strong>personal</strong> workflow, I had written a method to solve the <strong>main</strong> cases ! So, in this post, I’m going to show you, from an <strong>original</strong> file, how to keep :</p>
<ul>
<li>
<p dir="auto">All <strong>single</strong> lines, <em>ONLY</em></p>
</li>
<li>
<p dir="auto">All <strong>duplicate</strong> lines, <em>ONLY</em></p>
</li>
<li>
<p dir="auto">All <strong>single</strong> lines and the <strong>first</strong> copy of all <strong>duplicate</strong> lines</p>
</li>
<li>
<p dir="auto">All <strong>single</strong> lines and the <strong>last</strong> copy of all <strong>duplicate</strong> lines</p>
</li>
<li>
<p dir="auto">The <strong>first</strong> copy of all <strong>duplicate</strong> lines, <em>ONLY</em></p>
</li>
<li>
<p dir="auto">The <strong>last</strong> copy of all <strong>duplicate</strong> lines, <em>ONLY</em></p>
</li>
</ul>
<p dir="auto">I’ll use a file, named <strong><code>Test_File.txt</code></strong>, that both contains <strong>single</strong> lines and <strong>duplicate</strong> lines that appear in <strong><code>2, 3, 4</code></strong> or <strong><code>more</code></strong> times. It contains <strong><code>48</code></strong> color palettes, found from various sites and added one after another, giving a total of <strong><code>78,117</code></strong> records whose <strong><code>39,532</code></strong> are <strong>single</strong> lines and <strong><code>38,585</code></strong> are <strong>duplicate</strong> lines. On the other hand, if we count <strong><code>one</code></strong> copy of all the <strong>duplicates</strong>, this file contains <strong><code>11,290</code></strong> different <strong>duplicate</strong> lines.</p>
<p dir="auto">To <strong>test</strong> my solutions, simply download this <strong>UTF-8</strong> file ( <strong><code>5,937,560</code></strong> bytes ) from my <strong><code>Google Drive</code></strong> account :</p>
<p dir="auto"><a href="https://drive.google.com/file/d/1aYOpKon4KYw_NXSdj4Tm4Ti_FrygC2ky/view?usp=sharing" rel="nofollow ugc">https://drive.google.com/file/d/1aYOpKon4KYw_NXSdj4Tm4Ti_FrygC2ky/view?usp=sharing</a></p>
<hr />
<p dir="auto"><strong>Remarks</strong> :</p>
<p dir="auto">Note the definition of <strong>single</strong> lines : these are lines that differ in <strong>characters</strong> and/or <strong>case</strong> from <strong>all</strong> the other lines of the <strong>current</strong> file. For example, in this small file of <strong><code>14</code></strong> lines, below :</p>
<pre><code class="language-diff">    ABC
    xyz
    123
    789
    HIJ
    HIJ
    123
    AbC
    123
    HIJ
    abc
    HIJ
    456
    xyz
</code></pre>
<ul>
<li>
<p dir="auto">The <strong>5</strong> lines <strong><code>ABC</code></strong>, <strong><code>AbC</code></strong>, <strong><code>abc</code></strong>, <strong><code>789</code></strong> and <strong><code>456</code></strong> are considered to be <strong>single</strong> lines, as different in <strong>chars</strong> and/or <strong>case</strong> from <strong>all</strong> the other lines.</p>
</li>
<li>
<p dir="auto">The <strong>3</strong> <strong><code>123</code></strong> lines are considered to be a <strong>duplicate</strong> line with <strong><code>3</code></strong> copies ( <strong>Multiple</strong> occurrences )</p>
</li>
<li>
<p dir="auto">The <strong>2</strong> <strong><code>xyz</code></strong> lines are considered to be a <strong>duplicate</strong> line with <strong><code>2</code></strong> copies ( <strong>Multiple</strong> occurrences )</p>
</li>
<li>
<p dir="auto">Les <strong>4</strong> <strong><code>HIJ</code></strong> lines are considered to be a <strong>duplicate</strong> line with <strong><code>4</code></strong> copies ( <strong>Multiple</strong> occurrences )</p>
</li>
</ul>
<hr />
<p dir="auto"><strong>IMPORTANT</strong> :</p>
<p dir="auto">I’ve done some of the work for you, by adding a final <strong>column</strong> that <strong>numbers</strong> all lines in this file. Thus, is will be easy to restore the <strong>original</strong> order of the <strong>remaining</strong> records, after that each processing is <strong>complete</strong>. So, in case you need this <strong>initial</strong> order :</p>
<ul>
<li>
<p dir="auto">Put the <strong>caret</strong> right <strong>before</strong> the <strong>present</strong> number, at the end of the <strong>first</strong> line</p>
</li>
<li>
<p dir="auto">Run the <strong><code>Edit &gt; Begin/End Select in Column Mode</code></strong> option ( or use the <strong><code>Alt + Shift + B</code></strong> shortcut )</p>
</li>
<li>
<p dir="auto">Move to the <strong>last</strong> line of the file</p>
</li>
<li>
<p dir="auto">Put the <strong>caret</strong> right <strong>before</strong> the <strong>present</strong> number, at the end of the <strong>last</strong> line</p>
</li>
<li>
<p dir="auto">Run again the <strong><code>Edit &gt; Begin/End Select in Column Mode</code></strong> option ( or use the <strong><code>Alt + Shift + B</code></strong> shortcut )</p>
</li>
</ul>
<p dir="auto">=&gt; A <em>ZERO-LINE</em> column mode selection should appear throughout <strong>all</strong> the lines</p>
<ul>
<li>Then, run the <strong><code>Edit &gt; Line Operations &gt; Sort Lines Lexicographically Ascending</code></strong> option</li>
</ul>
<p dir="auto">=&gt; The <strong>original</strong> order of the <strong>remaining</strong> records, <em>AFTER</em> completion of one of the <strong><code>6</code></strong> methods below, should be back !</p>
<hr />
<p dir="auto">In each procedure, below, <strong><code>1</code></strong> or <strong><code>2</code></strong> <em>S/R</em> are used. To process them :</p>
<ul>
<li>
<p dir="auto">First, <strong>cancel</strong> any existing selection to ensure that any <strong>line-end</strong> character will be taken in account during the <strong>S/R</strong> phase</p>
</li>
<li>
<p dir="auto">Open the <strong>Replace</strong> dialog ( <strong><code>Ctrl + H</code></strong> )</p>
</li>
<li>
<p dir="auto"><strong>Uncheck</strong> all <strong>box</strong> options</p>
</li>
<li>
<p dir="auto"><strong>Check</strong> the <strong><code>Wrap around</code></strong> option</p>
</li>
<li>
<p dir="auto">Select the <strong><code>Regular expression</code></strong> search mode</p>
</li>
<li>
<p dir="auto">Click on the <strong><code>Replace All</code></strong> button</p>
</li>
</ul>
<hr />
<h4>(1) To keep all the SINGLE lines ONLY ( <strong><code>39,532</code></strong> records ) :</h4>
<ul>
<li>
<p dir="auto">Paste the <strong><code>Text_File.txt</code></strong> contents in a <strong>new</strong> tab</p>
</li>
<li>
<p dir="auto">Switch to that <strong>new</strong> tab and select all text ( <strong><code>Ctrl + A</code></strong> )</p>
</li>
<li>
<p dir="auto">Run the <strong><code>Edit &gt; Line Operations &gt; Sort Lines Lexicographically Ascending</code></strong> option</p>
</li>
<li>
<p dir="auto">Click anywhere, in the <strong>new</strong> tab, to <strong>cancel</strong> the <strong>entire</strong> selection</p>
</li>
<li>
<p dir="auto">SEARCH <strong><code>(?x-is) ^ ( .+ ) .{7} \R (?: \1 .{7} \R )+</code></strong></p>
</li>
<li>
<p dir="auto">REPLACE <strong><code>Leave EMPTY</code></strong></p>
</li>
<li>
<p dir="auto">Perform the <strong>IMPORTANT</strong> section, above</p>
</li>
</ul>
<hr />
<h4>(2) To keep all the DUPLICATE lines ONLY ( <strong><code>38,585 records = 78,117 - 39,532</code></strong> ) :</h4>
<ul>
<li>
<p dir="auto">Paste the <strong><code>Text_File.txt</code></strong> contents in a <strong>new</strong> tab</p>
</li>
<li>
<p dir="auto">Switch to that <strong>new</strong> tab and select all text ( <strong><code>Ctrl + A</code></strong> )</p>
</li>
<li>
<p dir="auto">Run the <strong><code>Edit &gt; Line Operations &gt; Sort Lines Lexicographically Ascending</code></strong> option</p>
</li>
<li>
<p dir="auto">Click anywhere, in the <strong>new</strong> tab, to <strong>cancel</strong> the <strong>entire</strong> selection</p>
</li>
<li>
<p dir="auto">SEARCH <strong><code>(?x-is) ^ ( .+ ) .{7} \R (?: \1 .{7} \R )+ (*SKIP) (*F) | ^ .+ \R</code></strong></p>
</li>
<li>
<p dir="auto">REPLACE <strong><code>Leave EMPTY</code></strong></p>
</li>
<li>
<p dir="auto">Perform the <strong>IMPORTANT</strong> section, above</p>
</li>
</ul>
<hr />
<h4>(3) To keep all the SINGLE lines and the FIRST copy of ALL the DUPLICATE lines, found AFTER the sort ( <strong><code>50,822</code></strong> records ) :</h4>
<ul>
<li>
<p dir="auto">Paste the <strong><code>Text_File.txt</code></strong> contents in a <strong>new</strong> tab</p>
</li>
<li>
<p dir="auto">Switch to that <strong>new</strong> tab and select all text ( <strong><code>Ctrl + A</code></strong> )</p>
</li>
<li>
<p dir="auto">Run the <strong><code>Edit &gt; Line Operations &gt; Sort Lines Lexicographically Ascending</code></strong> option</p>
</li>
<li>
<p dir="auto">Click anywhere, in the <strong>new</strong> tab, to <strong>cancel</strong> the <strong>entire</strong> selection</p>
</li>
<li>
<p dir="auto">SEARCH <strong><code>(?x-is) ^ ( ( .+ ) .{7} \R ) (?: \2 .{7} \R )+</code></strong></p>
</li>
<li>
<p dir="auto">REPLACE <strong><code>\1</code></strong></p>
</li>
<li>
<p dir="auto">Perform the <strong>IMPORTANT</strong> section, above</p>
</li>
</ul>
<hr />
<h4>(4) To keep all the SINGLE lines and the LAST copy of all the DUPLICATE lines, found AFTER the sort ( <strong><code>50,822</code></strong> records ) :</h4>
<ul>
<li>
<p dir="auto">Paste the <strong><code>Text_File.txt</code></strong> contents in a <strong>new</strong> tab</p>
</li>
<li>
<p dir="auto">Switch to that <strong>new</strong> tab and select all text ( <strong><code>Ctrl + A</code></strong> )</p>
</li>
<li>
<p dir="auto">Run the <strong><code>Edit &gt; Line Operations &gt; Sort Lines Lexicographically Ascending</code></strong> option</p>
</li>
<li>
<p dir="auto">Click anywhere, in the <strong>new</strong> tab, to <strong>cancel</strong> the <strong>entire</strong> selection</p>
</li>
<li>
<p dir="auto">SEARCH <strong><code>(?x-is) ^ ( .+ ) .{7} \R (?: \1 .{7} \R )* ( \1 .{7} \R )</code></strong></p>
</li>
<li>
<p dir="auto">REPLACE <strong><code>\2</code></strong></p>
</li>
<li>
<p dir="auto">Perform the <strong>IMPORTANT</strong> section, above</p>
</li>
</ul>
<hr />
<h4>(5) To keep the FIRST copy of all the DUPLICATE lines ONLY, found AFTER the sort ( <strong><code>11,290 = 50,822 - 39,532</code></strong> ) :</h4>
<ul>
<li>
<p dir="auto">Paste the <strong><code>Text_File.txt</code></strong> contents in a <strong>new</strong> tab</p>
</li>
<li>
<p dir="auto">Switch to that <strong>new</strong> tab and select all text ( <strong><code>Ctrl + A</code></strong> )</p>
</li>
<li>
<p dir="auto">Run the <strong><code>Edit &gt; Line Operations &gt; Sort Lines Lexicographically Ascending</code></strong> option</p>
</li>
<li>
<p dir="auto">Click anywhere, in the <strong>new</strong> tab, to <strong>cancel</strong> the <strong>entire</strong> selection</p>
</li>
<li>
<p dir="auto">SEARCH <strong><code>(?x-is) ^ ( .+ ) .{7} \R (?: \1 .{7} \R )+ (*SKIP) (*F) | ^ .+ \R</code></strong></p>
</li>
<li>
<p dir="auto">REPLACE <strong><code>Leave EMPTY</code></strong></p>
</li>
</ul>
<p dir="auto">Then :</p>
<ul>
<li>
<p dir="auto">SEARCH <strong><code>(?x-is) ^ ( ( .+ ) .{7} \R ) (?: \2 .{7} \R )+</code></strong></p>
</li>
<li>
<p dir="auto">REPLACE <strong><code>\1</code></strong></p>
</li>
<li>
<p dir="auto">Perform the <strong>IMPORTANT</strong> section, above</p>
</li>
</ul>
<hr />
<h4>(6) To keep the LAST copy of all the DUPLICATE lines ONLY, found AFTER the sort ( <strong><code>11,290 = 50,822 - 39,532</code></strong> ) :</h4>
<ul>
<li>
<p dir="auto">Paste the <strong><code>Text_File.txt</code></strong> contents in a <strong>new</strong> tab</p>
</li>
<li>
<p dir="auto">Switch to that <strong>new</strong> tab and select all text ( <strong><code>Ctrl + A</code></strong> )</p>
</li>
<li>
<p dir="auto">Run the <strong><code>Edit &gt; Line Operations &gt; Sort Lines Lexicographically Ascending</code></strong> option</p>
</li>
<li>
<p dir="auto">Click anywhere, in the <strong>new</strong> tab, to <strong>cancel</strong> the <strong>entire</strong> selection</p>
</li>
<li>
<p dir="auto">SEARCH <strong><code>(?x-is) ^ ( .+ ) .{7} \R (?: \1 .{7} \R )+ (*SKIP) (*F) | ^ .+ \R</code></strong></p>
</li>
<li>
<p dir="auto">REPLACE <strong><code>Leave EMPTY</code></strong></p>
</li>
</ul>
<p dir="auto">Then :</p>
<ul>
<li>
<p dir="auto">SEARCH <strong><code>(?x-is) ^ ( .+ ) .{7} \R (?: \1 .{7} \R )* ( \1 .{7} \R )</code></strong></p>
</li>
<li>
<p dir="auto">REPLACE <strong><code>\2</code></strong></p>
</li>
<li>
<p dir="auto">Perform the <strong>IMPORTANT</strong> section, above</p>
</li>
</ul>
<hr />
<p dir="auto">At the <strong>very end</strong> of <strong>any</strong> of these choices, you may <strong>delete</strong> the extra <strong>numeration</strong> :</p>
<ul>
<li>
<p dir="auto">SEARCH <strong><code>(?x-s) .{7} $</code></strong></p>
</li>
<li>
<p dir="auto">REPLACE <strong><code>Leave EMPTY</code></strong></p>
</li>
<li>
<p dir="auto">Then run the <strong><code>Edit &gt; Blank Operations &gt; Trim Trailing Space</code></strong></p>
</li>
</ul>
<hr />
<p dir="auto">Best Regards,</p>
<p dir="auto">guy038</p>
<p dir="auto"><strong>P.S.</strong> :</p>
<p dir="auto">Note that there is also a <strong>native</strong> way to get all the <strong>single</strong> lines and the <strong>first</strong> copy of all the <strong>duplicate</strong> lines, found with the <strong>present</strong> order ( <strong><code>50,822</code></strong> records ) :</p>
<ul>
<li>
<p dir="auto">Paste the <strong><code>Text_File.txt</code></strong> contents in a <strong>new</strong> tab</p>
</li>
<li>
<p dir="auto">Switch to that <strong>new</strong> tab</p>
</li>
<li>
<p dir="auto"><strong>Delete</strong> the numeration, at <strong>end</strong> of each line :</p>
<ul>
<li>
<p dir="auto">SEARCH <strong><code>(?x-s) .{7} $</code></strong></p>
</li>
<li>
<p dir="auto">REPLACE <strong><code>Leave EMPTY</code></strong></p>
</li>
</ul>
</li>
<li>
<p dir="auto">Then, use the <strong><code>Edit &gt; Line Opérations &gt; Remove Duplicate lines</code></strong> option</p>
</li>
</ul>
]]></description><link>https://community.notepad-plus-plus.org/topic/27470/about-single-and-duplicate-lines</link><generator>RSS for Node</generator><lastBuildDate>Tue, 21 Apr 2026 20:42:43 GMT</lastBuildDate><atom:link href="https://community.notepad-plus-plus.org/topic/27470.rss" rel="self" type="application/rss+xml"/><pubDate>Tue, 24 Mar 2026 15:45:19 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to About single and duplicate lines... on Tue, 07 Apr 2026 03:57:18 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/195">@guy038</a> said in <a href="/post/105031">About single and duplicate lines...</a>:</p>
<blockquote>
<p dir="auto">About single and duplicate lines…<br />
General Discussion<br />
1 posts<br />
1 posters<br />
225 views</p>
<p dir="auto">guy038<br />
Mar 24, 2026, 8:45 PM</p>
<p dir="auto">Hello, All,</p>
<p dir="auto">Reading that post made me realize that searching for single or duplicate lines is a very common task. Some time ago, for my personal workflow, I had written a method to solve the main cases ! So, in this post, I’m going to show you, from an original file, how to keep :</p>
<p dir="auto">All single lines, ONLY</p>
<p dir="auto">All duplicate lines, ONLY</p>
<p dir="auto">All single lines and the first copy of all duplicate lines</p>
<p dir="auto">All single lines and the last copy of all duplicate lines</p>
<p dir="auto">The first copy of all duplicate lines, ONLY</p>
<p dir="auto">The last copy of all duplicate lines, ONLY</p>
<p dir="auto">I’ll use a file, named Test_File.txt, that both contains single lines and duplicate lines that appear in 2, 3, 4 or more times. It contains 48 color palettes, found from various sites and added one after another, giving a total of 78,117 records whose 39,532 are single lines and 38,585 are duplicate lines. On the other hand, if we count one copy of all the duplicates, this file contains 11,290 different duplicate lines.</p>
<p dir="auto">To test my solutions, simply download this UTF-8 file ( 5,937,560 bytes ) from my Google Drive account :</p>
<p dir="auto"><a href="https://drive.google.com/file/d/1aYOpKon4KYw_NXSdj4Tm4Ti_FrygC2ky/view?usp=sharing" rel="nofollow ugc">https://drive.google.com/file/d/1aYOpKon4KYw_NXSdj4Tm4Ti_FrygC2ky/view?usp=sharing</a></p>
<p dir="auto">Remarks :</p>
<p dir="auto">Note the definition of single lines : these are lines that differ in characters and/or case from all the other lines of the current file. For example, in this small file of 14 lines, below :</p>
<pre><code>ABC
xyz
123
789
HIJ
HIJ
123
AbC
123
HIJ
abc
HIJ
456
xyz
</code></pre>
<p dir="auto">The 5 lines ABC, AbC, abc, 789 and 456 are considered to be single lines, as different in chars and/or case from all the other lines.</p>
<p dir="auto">The 3 123 lines are considered to be a duplicate line with 3 copies ( Multiple occurrences )</p>
<p dir="auto">The 2 xyz lines are considered to be a duplicate line with 2 copies ( Multiple occurrences )</p>
<p dir="auto">Les 4 HIJ lines are considered to be a duplicate line with 4 copies ( Multiple occurrences )</p>
<p dir="auto">IMPORTANT :</p>
<p dir="auto">I’ve done some of the work for you, by adding a final column that numbers all lines in this file. Thus, is will be easy to restore the original order of the remaining records, after that each processing is complete. So, in case you need this initial order :</p>
<p dir="auto">Put the caret right before the present number, at the end of the first line</p>
<p dir="auto">Run the Edit &gt; Begin/End Select in Column Mode option ( or use the Alt + Shift + B shortcut )</p>
<p dir="auto">Move to the last line of the file</p>
<p dir="auto">Put the caret right before the present number, at the end of the last line</p>
<p dir="auto">Run again the Edit &gt; Begin/End Select in Column Mode option ( or use the Alt + Shift + B shortcut )</p>
<p dir="auto">=&gt; A ZERO-LINE column mode selection should appear throughout all the lines</p>
<p dir="auto">Then, run the Edit &gt; Line Operations &gt; Sort Lines Lexicographically Ascending option<br />
=&gt; The original order of the remaining records, AFTER completion of one of the 6 methods below, should be back !</p>
<p dir="auto">In each procedure, below, 1 or 2 S/R are used. To process them :</p>
<p dir="auto">First, cancel any existing selection to ensure that any line-end character will be taken in account during the S/R phase</p>
<p dir="auto">Open the Replace dialog ( Ctrl + H )</p>
<p dir="auto">Uncheck all box options</p>
<p dir="auto">Check the Wrap around option</p>
<p dir="auto">Select the Regular expression search mode</p>
<p dir="auto">Click on the Replace All button</p>
<p dir="auto">(1) To keep all the SINGLE lines ONLY ( 39,532 records ) :<br />
Paste the Text_File.txt contents in a new tab</p>
<p dir="auto">Switch to that new tab and select all text ( Ctrl + A )</p>
<p dir="auto">Run the Edit &gt; Line Operations &gt; Sort Lines Lexicographically Ascending option</p>
<p dir="auto">Click anywhere, in the new tab, to cancel the entire selection</p>
<p dir="auto">SEARCH (?x-is) ^ ( .+ ) .{7} \R (?: \1 .{7} \R )+</p>
<p dir="auto">REPLACE Leave EMPTY</p>
<p dir="auto">Perform the IMPORTANT section, above</p>
<p dir="auto">(2) To keep all the DUPLICATE lines ONLY ( 38,585 records = 78,117 - 39,532 ) :<br />
Paste the Text_File.txt contents in a new tab</p>
<p dir="auto">Switch to that new tab and select all text ( Ctrl + A )</p>
<p dir="auto">Run the Edit &gt; Line Operations &gt; Sort Lines Lexicographically Ascending option</p>
<p dir="auto">Click anywhere, in the new tab, to cancel the entire selection</p>
<p dir="auto">SEARCH (?x-is) ^ ( .+ ) .{7} \R (?: \1 .{7} \R )+ (*SKIP) (*F) | ^ .+ \R</p>
<p dir="auto">REPLACE Leave EMPTY</p>
<p dir="auto">Perform the IMPORTANT section, above</p>
<p dir="auto">(3) To keep all the SINGLE lines and the FIRST copy of ALL the DUPLICATE lines, found AFTER the sort ( 50,822 records ) :<br />
Paste the Text_File.txt contents in a new tab</p>
<p dir="auto">Switch to that new tab and select all text ( Ctrl + A )</p>
<p dir="auto">Run the Edit &gt; Line Operations &gt; Sort Lines Lexicographically Ascending option</p>
<p dir="auto">Click anywhere, in the new tab, to cancel the entire selection</p>
<p dir="auto">SEARCH (?x-is) ^ ( ( .+ ) .{7} \R ) (?: \2 .{7} \R )+</p>
<p dir="auto">REPLACE \1</p>
<p dir="auto">Perform the IMPORTANT section, above</p>
<p dir="auto">(4) To keep all the SINGLE lines and the LAST copy of all the DUPLICATE lines, found AFTER the sort ( 50,822 records ) :<br />
Paste the Text_File.txt contents in a new tab</p>
<p dir="auto">Switch to that new tab and select all text ( Ctrl + A )</p>
<p dir="auto">Run the Edit &gt; Line Operations &gt; Sort Lines Lexicographically Ascending option</p>
<p dir="auto">Click anywhere, in the new tab, to cancel the entire selection</p>
<p dir="auto">SEARCH (?x-is) ^ ( .+ ) .{7} \R (?: \1 .{7} \R )* ( \1 .{7} \R )</p>
<p dir="auto">REPLACE \2</p>
<p dir="auto">Perform the IMPORTANT section, above</p>
<p dir="auto">(5) To keep the FIRST copy of all the DUPLICATE lines ONLY, found AFTER the sort ( 11,290 = 50,822 - 39,532 ) :<br />
Paste the Text_File.txt contents in a new tab</p>
<p dir="auto">Switch to that new tab and select all text ( Ctrl + A )</p>
<p dir="auto">Run the Edit &gt; Line Operations &gt; Sort Lines Lexicographically Ascending option</p>
<p dir="auto">Click anywhere, in the new tab, to cancel the entire selection</p>
<p dir="auto">SEARCH (?x-is) ^ ( .+ ) .{7} \R (?: \1 .{7} \R )+ (*SKIP) (*F) | ^ .+ \R</p>
<p dir="auto">REPLACE Leave EMPTY</p>
<p dir="auto">Then :</p>
<p dir="auto">SEARCH (?x-is) ^ ( ( .+ ) .{7} \R ) (?: \2 .{7} \R )+</p>
<p dir="auto">REPLACE \1</p>
<p dir="auto">Perform the IMPORTANT section, above</p>
<p dir="auto">(6) To keep the LAST copy of all the DUPLICATE lines ONLY, found AFTER the sort ( 11,290 = 50,822 - 39,532 ) :<br />
Paste the Text_File.txt contents in a new tab</p>
<p dir="auto">Switch to that new tab and select all text ( Ctrl + A )</p>
<p dir="auto">Run the Edit &gt; Line Operations &gt; Sort Lines Lexicographically Ascending option</p>
<p dir="auto">Click anywhere, in the new tab, to cancel the entire selection</p>
<p dir="auto">SEARCH (?x-is) ^ ( .+ ) .{7} \R (?: \1 .{7} \R )+ (*SKIP) (*F) | ^ .+ \R</p>
<p dir="auto">REPLACE Leave EMPTY</p>
<p dir="auto">Then :</p>
<p dir="auto">SEARCH (?x-is) ^ ( .+ ) .{7} \R (?: \1 .{7} \R )* ( \1 .{7} \R )</p>
<p dir="auto">REPLACE \2</p>
<p dir="auto">Perform the IMPORTANT section, above</p>
<p dir="auto">At the very end of any of these choices, you may delete the extra numeration :</p>
<p dir="auto">SEARCH (?x-s) .{7} $</p>
<p dir="auto">REPLACE Leave EMPTY</p>
<p dir="auto">Then run the Edit &gt; Blank Operations &gt; Trim Trailing Space</p>
<p dir="auto">Best Regards,</p>
<p dir="auto">guy038</p>
<p dir="auto">P.S. :</p>
<p dir="auto">Note that there is also a native way to get all the single lines and the first copy of all the duplicate lines, found with the present order ( 50,822 records ) :</p>
<p dir="auto">Paste the Text_File.txt contents in a new tab</p>
<p dir="auto">Switch to that new tab</p>
<p dir="auto">Delete the numeration, at end of each line :</p>
<p dir="auto">SEARCH (?x-s) .{7} $</p>
<p dir="auto">REPLACE Leave EMPTY</p>
<p dir="auto">Then, use the Edit &gt; Line Opérations &gt; Remove Duplicate lines option</p>
</blockquote>
<p dir="auto">That’s a pretty solid breakdown 👍</p>
<p dir="auto">For most cases though, I’d honestly just go with the built-in “Remove Duplicate Lines” unless you specifically need first/last occurrences. Way simpler and less error-prone.</p>
<p dir="auto">The regex approach is powerful, but yeah… a bit overkill unless you’re dealing with very specific cases or large datasets.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/105187</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/105187</guid><dc:creator><![CDATA[Evelyn Walker]]></dc:creator><pubDate>Tue, 07 Apr 2026 03:57:18 GMT</pubDate></item></channel></rss>