<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Delete line with duplicate Number]]></title><description><![CDATA[<p dir="auto">**Sorry for the repost, going to try and simplify my question. **</p>
<p dir="auto">I’ve spent the first half of my day trying to figure out how to do this, along with googling to find my exact answer I was unable.</p>
<p dir="auto">…Random words, letters, and numbers are on each line…</p>
<p dir="auto"><strong>Objective:</strong> find lines that have exact duplicate numbers (not letters or words).</p>
<p dir="auto">Before example:</p>
<p dir="auto">A dog went to the mall - #11364<br />
The dog went to the store - #11364<br />
A dog is at the mall - #14369<br />
Dog to the store random - #14369<br />
Sentence a random - #13677<br />
The went dog to store - #11159</p>
<p dir="auto">After example:</p>
<p dir="auto">A random sentence - #11364<br />
A sentence random - #14369<br />
Sentence a random - #13677<br />
The went dog to store - #11159</p>
<ul>
<li>The formula needs to at least:  match lines that have identical numbers.</li>
<li>The formula does NOT need to: delete one of the lines</li>
</ul>
<p dir="auto">I’m fine with manually deleting the lines that have an identical number match.</p>
<p dir="auto">Any help is appreciated, thank you</p>
]]></description><link>https://community.notepad-plus-plus.org/topic/19641/delete-line-with-duplicate-number</link><generator>RSS for Node</generator><lastBuildDate>Sat, 06 Jun 2026 01:37:59 GMT</lastBuildDate><atom:link href="https://community.notepad-plus-plus.org/topic/19641.rss" rel="self" type="application/rss+xml"/><pubDate>Thu, 02 Jul 2020 18:23:09 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to Delete line with duplicate Number on Sat, 04 Jul 2020 23:10:58 GMT]]></title><description><![CDATA[<p dir="auto">Hello <a class="plugin-mentions-user plugin-mentions-a" href="/user/jim-erlich" aria-label="Profile: jim-erlich">@<bdi>jim-erlich</bdi></a> and <strong>All</strong>,</p>
<p dir="auto">Sorry for being <strong>late</strong> ! So, here are, below, some explanations about my <strong>regex</strong> S/R :</p>
<p dir="auto">SEARCH  <strong><code>(?-s)^.+#\x20?(\d+)\R(?=.+#\x20?\1)</code></strong></p>
<p dir="auto">REPLACE <strong><code>Leave EMPTY</code></strong></p>
<ul>
<li>
<p dir="auto">First, the <strong><code>(?-s)</code></strong> <strong>in-line</strong> modifier ensures that any further <strong><code>.</code></strong> regex symbol corresponds to a <strong>single standard</strong> character, only and <strong>not</strong> to a <strong>line-break</strong> char !</p>
</li>
<li>
<p dir="auto">So, the <strong>next</strong> part <strong><code>^.+#\x20?</code></strong> searches, from <strong>beginning</strong> of line ( <strong><code>^</code></strong> ), any <strong>non-null</strong> range of characters ( <strong><code>.+</code></strong> ), followed by the <strong><code>#</code></strong> symbol and an <strong>optional</strong> <strong><code>space</code></strong> char (<strong><code>\x20?</code></strong>)</p>
</li>
<li>
<p dir="auto">Then, it looks for a <strong>non-null</strong> range of <strong>digits</strong> ( <strong><code>\d+</code></strong> ), followed by <strong>line-break</strong> character(s)</p>
</li>
<li>
<p dir="auto">So, the regex engine looks for an <strong>entire</strong> line ( digits after the <strong><code># </code></strong> are stored as <strong>group <code>1</code></strong> as embedded in <strong>parentheses</strong> ) but <em>ONLY IF</em> the <strong>next</strong> line <strong>ends</strong> with the <strong>same</strong> number !</p>
</li>
<li>
<p dir="auto">This <strong>condition</strong> can be expressed with a <strong>look-ahead</strong> structure <strong><code>(?=......)</code></strong> which are rather a <strong>user</strong> assertion in the <strong>same</strong> way that, for instance, the <strong><code>$</code></strong> symbol is a <strong>system</strong> assertion, looking for the <strong>zero</strong> length assertion <strong>“end of line”</strong> !</p>
</li>
<li>
<p dir="auto">So <strong>current</strong> line <strong>must</strong> be followed with the regex <strong><code>.+#\x20?\1</code></strong>, which represents, again, a <strong>non-null</strong> range of <strong>standard</strong> characters followed with a <strong><code>#</code></strong> and possibly a <strong><code>space</code></strong> char and finally the <strong>group <code>1</code></strong> ( <strong><code>\1</code></strong> ) which is the <strong>ending</strong> number of the <strong>current</strong> line</p>
</li>
<li>
<p dir="auto">Note that the <strong><code>^</code></strong> assertion for the <strong>second</strong> line, in the <strong>look-ahead</strong> structure, is <strong>useless</strong> as the range <strong><code>(.+)</code></strong> comes next the <strong>line-break</strong> char(s) <strong><code>\R</code></strong>, anyway !</p>
</li>
<li>
<p dir="auto">As the <strong>replacement</strong> zone is <strong><code>empty</code></strong>, the <strong>current</strong> line, with its <strong>line-break</strong>, is just <strong>deleted</strong></p>
</li>
</ul>
<hr />
<p dir="auto">For a <strong>quick</strong> oversight about <strong>regular</strong> expressions, see the N++ <strong>documentation</strong>, below :</p>
<p dir="auto"><a href="https://npp-user-manual.org/docs/searching/#regular-expressions" rel="nofollow ugc">https://npp-user-manual.org/docs/searching/#regular-expressions</a></p>
<p dir="auto">See also the <strong>main</strong> links regarding the <strong><code>Boost regex</code></strong> library, used by the regex <strong>N++</strong> engine :</p>
<p dir="auto"><a href="https://www.boost.org/doc/libs/1_70_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html" rel="nofollow ugc">https://www.boost.org/doc/libs/1_70_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html</a></p>
<p dir="auto"><a href="https://www.boost.org/doc/libs/1_70_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html" rel="nofollow ugc">https://www.boost.org/doc/libs/1_70_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html</a></p>
<p dir="auto">Finally, see this <strong>FAQ</strong> topic about <strong>regular</strong> expressions :</p>
<p dir="auto"><a href="https://community.notepad-plus-plus.org/topic/15765/faq-desk-where-to-find-regex-documentation">https://community.notepad-plus-plus.org/topic/15765/faq-desk-where-to-find-regex-documentation</a></p>
<p dir="auto">Best Regards,</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/55554</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/55554</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Sat, 04 Jul 2020 23:10:58 GMT</pubDate></item><item><title><![CDATA[Reply to Delete line with duplicate Number on Thu, 02 Jul 2020 22:36:50 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/guy038" aria-label="Profile: guy038">@<bdi>guy038</bdi></a> said in <a href="/post/55520">Delete line with duplicate Number</a>:</p>
<blockquote>
<p dir="auto">Hi, <a class="plugin-mentions-user plugin-mentions-a" href="/user/jim-erlich" aria-label="Profile: jim-erlich">@<bdi>jim-erlich</bdi></a> and <strong>All</strong>,</p>
<p dir="auto">Many <strong>thanks</strong> for <strong>all</strong> your information ! It should be <strong>very easy</strong> to get the <strong>right</strong> solution !</p>
<p dir="auto">The <strong>most important</strong> points are :</p>
<ul>
<li>
<p dir="auto">Your file is already <strong>sorted</strong> by ascending <strong><code>#number</code></strong></p>
</li>
<li>
<p dir="auto">And, in case of <strong><code>1</code></strong> <strong>duplicate</strong> line, it is located <strong>right after</strong> the <strong>original</strong> line !</p>
</li>
</ul>
<hr />
<p dir="auto">Now, I don’t know how your <strong>preliminary sort</strong> behaved with <strong>numbers</strong>, separated with a <strong><code>space</code></strong> char from the <strong><code>#</code></strong> symbol ?</p>
<p dir="auto">For instance :</p>
<p dir="auto">Are lines sorted, as below ( case <strong><code>A</code></strong> ) :</p>
<pre><code class="language-z">The dog went to the park - #4599
The cat went to the park - # 4657
The kid went to the park - #4797
The lizard went to the zoo - # 5100
The cat went to the zoo - #5120
</code></pre>
<p dir="auto">OR</p>
<p dir="auto">Are lines sorted, like ( case <strong><code>B</code></strong> ) :</p>
<pre><code class="language-z">The cat went to the park - # 4657
The lizard went to the zoo - # 5100
The dog went to the park - #4599
The kid went to the park - #4797
The cat went to the zoo - #5120
</code></pre>
<p dir="auto">Indeed, the N++ sort would place the line <strong><code># 10000 </code></strong> <strong>before</strong> the line <strong><code>#5000</code></strong>, as <strong><code>space</code></strong> code-point is <strong>smaller</strong> than code-point of a <strong><code>digit</code></strong> !</p>
<hr />
<p dir="auto">Anyway, assuming a <strong>sort</strong>, like in case <strong><code>A</code></strong> and the <strong>initial</strong> text :</p>
<pre><code class="language-diff">The dog went to the park - #4599
You went to the zoo - # 4640
He went to the park - # 4640
The cat went to the park - #4657
The kid went to the park - # 4657
The girl went to the park - #4900
The lizard went to the zoo - # 5100
The cat went to the zoo - #5100
I went to the park - #7500
We went to the zoo - #7500
They went to the park - #14000
</code></pre>
<p dir="auto">Here is the <strong>road</strong> map :</p>
<ul>
<li>
<p dir="auto">Open your file in <strong>N++</strong></p>
</li>
<li>
<p dir="auto">Open the <strong>Replace</strong> dialog ( <strong><code>Ctrl + H</code></strong> )</p>
</li>
<li>
<p dir="auto">SEARCH <strong><code>(?-s)^.+#\x20?(\d+)\R(?=.+#\x20?\1)</code></strong></p>
</li>
<li>
<p dir="auto">REPLACE <strong><code>Leave EMPTY</code></strong></p>
</li>
<li>
<p dir="auto">Tick the <strong><code>Wrap around</code></strong> option</p>
</li>
<li>
<p dir="auto">Select the <strong><code>Regular expression</code></strong> search mode</p>
</li>
<li>
<p dir="auto">Click <strong>once</strong> on the <strong><code>Replace All</code></strong> button or <strong>several</strong> times on the <strong><code>Replace</code></strong> button</p>
</li>
</ul>
<p dir="auto">Voila !</p>
<p dir="auto">You should get your <strong>expected</strong> list :</p>
<pre><code class="language-diff">The dog went to the park - #4599
He went to the park - # 4640
The kid went to the park - # 4657
The girl went to the park - #4900
The cat went to the zoo - #5100
We went to the zoo - #7500
They went to the park - #14000
</code></pre>
<hr />
<p dir="auto">Note that if your <strong>sort</strong> is rather like in case <strong><code>B</code></strong>, it shouldn’t be be difficult to get, again, the case <strong><code>A</code></strong> <strong>order</strong> and run the <strong>regex</strong> S/R, afterwards ;-))</p>
<p dir="auto">Next time, if everything was <strong>OK</strong>, I’ll <strong>explain</strong> how this <strong>regex</strong> S/R works !</p>
<p dir="auto">Best Regards,</p>
<p dir="auto">guy038</p>
</blockquote>
<p dir="auto">Flawless… absolutely amazing. You don’t understand how thankful I am for your detailed and correct response. This is a whole new language for me and I am amazed. thank you again.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/55522</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/55522</guid><dc:creator><![CDATA[Jim Erlich]]></dc:creator><pubDate>Thu, 02 Jul 2020 22:36:50 GMT</pubDate></item><item><title><![CDATA[Reply to Delete line with duplicate Number on Thu, 02 Jul 2020 21:04:37 GMT]]></title><description><![CDATA[<p dir="auto">Hi, <a class="plugin-mentions-user plugin-mentions-a" href="/user/jim-erlich" aria-label="Profile: jim-erlich">@<bdi>jim-erlich</bdi></a> and <strong>All</strong>,</p>
<p dir="auto">Many <strong>thanks</strong> for <strong>all</strong> your information ! It should be <strong>very easy</strong> to get the <strong>right</strong> solution !</p>
<p dir="auto">The <strong>most important</strong> points are :</p>
<ul>
<li>
<p dir="auto">Your file is already <strong>sorted</strong> by ascending <strong><code>#number</code></strong></p>
</li>
<li>
<p dir="auto">And, in case of <strong><code>1</code></strong> <strong>duplicate</strong> line, it is located <strong>right after</strong> the <strong>original</strong> line !</p>
</li>
</ul>
<hr />
<p dir="auto">Now, I don’t know how your <strong>preliminary sort</strong> behaved with <strong>numbers</strong>, separated with a <strong><code>space</code></strong> char from the <strong><code>#</code></strong> symbol ?</p>
<p dir="auto">For instance :</p>
<p dir="auto">Are lines sorted, as below ( case <strong><code>A</code></strong> ) :</p>
<pre><code class="language-z">The dog went to the park - #4599
The cat went to the park - # 4657
The kid went to the park - #4797
The lizard went to the zoo - # 5100
The cat went to the zoo - #5120
</code></pre>
<p dir="auto">OR</p>
<p dir="auto">Are lines sorted, like ( case <strong><code>B</code></strong> ) :</p>
<pre><code class="language-z">The cat went to the park - # 4657
The lizard went to the zoo - # 5100
The dog went to the park - #4599
The kid went to the park - #4797
The cat went to the zoo - #5120
</code></pre>
<p dir="auto">Indeed, the N++ sort would place the line <strong><code># 10000 </code></strong> <strong>before</strong> the line <strong><code>#5000</code></strong>, as <strong><code>space</code></strong> code-point is <strong>smaller</strong> than code-point of a <strong><code>digit</code></strong> !</p>
<hr />
<p dir="auto">Anyway, assuming a <strong>sort</strong>, like in case <strong><code>A</code></strong> and the <strong>initial</strong> text :</p>
<pre><code class="language-diff">The dog went to the park - #4599
You went to the zoo - # 4640
He went to the park - # 4640
The cat went to the park - #4657
The kid went to the park - # 4657
The girl went to the park - #4900
The lizard went to the zoo - # 5100
The cat went to the zoo - #5100
I went to the park - #7500
We went to the zoo - #7500
They went to the park - #14000
</code></pre>
<p dir="auto">Here is the <strong>road</strong> map :</p>
<ul>
<li>
<p dir="auto">Open your file in <strong>N++</strong></p>
</li>
<li>
<p dir="auto">Open the <strong>Replace</strong> dialog ( <strong><code>Ctrl + H</code></strong> )</p>
</li>
<li>
<p dir="auto">SEARCH <strong><code>(?-s)^.+#\x20?(\d+)\R(?=.+#\x20?\1)</code></strong></p>
</li>
<li>
<p dir="auto">REPLACE <strong><code>Leave EMPTY</code></strong></p>
</li>
<li>
<p dir="auto">Tick the <strong><code>Wrap around</code></strong> option</p>
</li>
<li>
<p dir="auto">Select the <strong><code>Regular expression</code></strong> search mode</p>
</li>
<li>
<p dir="auto">Click <strong>once</strong> on the <strong><code>Replace All</code></strong> button or <strong>several</strong> times on the <strong><code>Replace</code></strong> button</p>
</li>
</ul>
<p dir="auto">Voila !</p>
<p dir="auto">You should get your <strong>expected</strong> list :</p>
<pre><code class="language-diff">The dog went to the park - #4599
He went to the park - # 4640
The kid went to the park - # 4657
The girl went to the park - #4900
The cat went to the zoo - #5100
We went to the zoo - #7500
They went to the park - #14000
</code></pre>
<hr />
<p dir="auto">Note that if your <strong>sort</strong> is rather like in case <strong><code>B</code></strong>, it shouldn’t be be difficult to get, again, the case <strong><code>A</code></strong> <strong>order</strong> and run the <strong>regex</strong> S/R, afterwards ;-))</p>
<p dir="auto">Next time, if everything was <strong>OK</strong>, I’ll <strong>explain</strong> how this <strong>regex</strong> S/R works !</p>
<p dir="auto">Best Regards,</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/55520</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/55520</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Thu, 02 Jul 2020 21:04:37 GMT</pubDate></item><item><title><![CDATA[Reply to Delete line with duplicate Number on Thu, 02 Jul 2020 19:26:51 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/guy038" aria-label="Profile: guy038">@<bdi>guy038</bdi></a> said in <a href="/post/55518">Delete line with duplicate Number</a>:</p>
<blockquote>
<p dir="auto">Hello, <a class="plugin-mentions-user plugin-mentions-a" href="/user/jim-erlich" aria-label="Profile: jim-erlich">@<bdi>jim-erlich</bdi></a>,</p>
<p dir="auto"><strong>Before</strong> finding a way to <strong>solve</strong> your problem, we need <strong>additional</strong> information :</p>
<ul>
<li>
<p dir="auto">In your <strong>previous</strong> post, which is deleted by now, you spoke about a <strong>giant</strong> file : what is the <strong>approximative size</strong> and how <strong>many lines</strong> contains this file ?</p>
</li>
<li>
<p dir="auto">Do you <strong>mind</strong> if your file is modified by a <strong>preliminary</strong> sort ? If you don’t mind, this could, <strong>significantly</strong> simplify <strong>all</strong> the process ! )</p>
</li>
<li>
<p dir="auto">How many <strong>digits</strong> can exist after the <strong><code>#</code></strong> symbol ? Is <strong><code>5</code></strong> the maximum or <strong><code>8</code></strong> or <strong><code>10</code></strong> digits or … ?</p>
</li>
<li>
<p dir="auto">What is the <strong>maximum</strong> of lines between two <strong>“duplicate”</strong> lines ( for instance <strong><code>xxxxxxxx#2345</code></strong> and <strong><code>yyyyyyyy#12345</code></strong> )</p>
</li>
<li>
<p dir="auto">Are they more than <strong><code>2</code></strong> <strong>'duplicate"</strong> lines ( I mean, for instance, <strong><code>6</code></strong> lines <strong>ending</strong> with <strong><code>#12345</code></strong> )</p>
</li>
<li>
<p dir="auto">In case of multiple <strong>“duplicate”</strong> lines, which one you want to keep : the <strong>first</strong> duplicate line or the <strong>last</strong> duplicate ?</p>
</li>
</ul>
<p dir="auto">Note that we can cope with a <strong>possible</strong> <strong><code>space</code></strong> character between the <strong><code>#</code></strong> symbol and the <strong>number</strong> Not a problem !</p>
<p dir="auto">See you later !</p>
<p dir="auto">Best regards,</p>
<p dir="auto">guy038</p>
</blockquote>
<ul>
<li>12,000 lines broken up into different pages. Each page can have 300 to 1,100 lines.</li>
<li>Right now the lines are in Ascending order based upon the number.</li>
</ul>
<p dir="auto">For example:<br />
The dog went to the park - #4599<br />
The cat went to the park - #4657<br />
The kid went to the park - #4797<br />
The lizard went to the zoo - #5100<br />
The cat went to the zoo - #5120<br />
etc…</p>
<p dir="auto">Ideally, I would like to keep the numbers in Ascending order like this, and locate the lines that have duplicate numbers</p>
<ul>
<li>The number of digits after the # symbol is 1 to 5… the highest number being about 14000</li>
<li>Oh this is good… the duplicate line that has the duplicate number SHOULD BE on the next line. So it will be like this</li>
</ul>
<p dir="auto">For example:<br />
The dog went to the park - #12554<br />
The cat went to the park - #12554</p>
<p dir="auto">^^^ The formula should say “Hey!! These two numbers are identical” and then I will delete one of them manually.</p>
<ul>
<li>There will be only 2 ‘duplicate’ lines. There will <strong>not</strong> be 6 lines ending with #12345</li>
<li>Keep the <strong>last</strong> duplicate line, delete the first</li>
</ul>
<p dir="auto">I hope this is clear, let me know if there is anything else.</p>
<p dir="auto">Thank you</p>
]]></description><link>https://community.notepad-plus-plus.org/post/55519</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/55519</guid><dc:creator><![CDATA[Jim Erlich]]></dc:creator><pubDate>Thu, 02 Jul 2020 19:26:51 GMT</pubDate></item><item><title><![CDATA[Reply to Delete line with duplicate Number on Thu, 02 Jul 2020 18:53:49 GMT]]></title><description><![CDATA[<p dir="auto">Hello, <a class="plugin-mentions-user plugin-mentions-a" href="/user/jim-erlich" aria-label="Profile: jim-erlich">@<bdi>jim-erlich</bdi></a>,</p>
<p dir="auto"><strong>Before</strong> finding a way to <strong>solve</strong> your problem, we need <strong>additional</strong> information :</p>
<ul>
<li>
<p dir="auto">In your <strong>previous</strong> post, which is deleted by now, you spoke about a <strong>giant</strong> file : what is the <strong>approximative size</strong> and how <strong>many lines</strong> contains this file ?</p>
</li>
<li>
<p dir="auto">Do you <strong>mind</strong> if your file is modified by a <strong>preliminary</strong> sort ? If you don’t mind, this could, <strong>significantly</strong> simplify <strong>all</strong> the process ! )</p>
</li>
<li>
<p dir="auto">How many <strong>digits</strong> can exist after the <strong><code>#</code></strong> symbol ? Is <strong><code>5</code></strong> the maximum or <strong><code>8</code></strong> or <strong><code>10</code></strong> digits or … ?</p>
</li>
<li>
<p dir="auto">What is the <strong>maximum</strong> of lines between two <strong>“duplicate”</strong> lines ( for instance <strong><code>xxxxxxxx#2345</code></strong> and <strong><code>yyyyyyyy#12345</code></strong> )</p>
</li>
<li>
<p dir="auto">Are they more than <strong><code>2</code></strong> <strong>'duplicate"</strong> lines ( I mean, for instance, <strong><code>6</code></strong> lines <strong>ending</strong> with <strong><code>#12345</code></strong> )</p>
</li>
<li>
<p dir="auto">In case of multiple <strong>“duplicate”</strong> lines, which one you want to keep : the <strong>first</strong> duplicate line or the <strong>last</strong> duplicate ?</p>
</li>
</ul>
<p dir="auto">Note that we can cope with a <strong>possible</strong> <strong><code>space</code></strong> character between the <strong><code>#</code></strong> symbol and the <strong>number</strong> Not a problem !</p>
<p dir="auto">See you later !</p>
<p dir="auto">Best regards,</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/55518</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/55518</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Thu, 02 Jul 2020 18:53:49 GMT</pubDate></item></channel></rss>