<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[RegEx: Split each number of a string inside curly brackets into a separate line, add a prefix to it &amp; remove all unnecessary data]]></title><description><![CDATA[<p dir="auto">Hello everyone!</p>
<p dir="auto">I need help with annoyingly tricky RegEx that I just can’t figure out! Maybe there is <strong>no</strong> solution after all? The nearly complete lack of any pattern is a huge problem for this RegEx. I have been racking my brain for hours on this, testing countless approaches but my skills simply aren’t enough for this… so I would really appreciate any help you can provide! In advance, many, many thanks for any help!</p>
<p dir="auto">Anyway, I have a huge database of <strong>dozens</strong> of files, each containing typically <strong>millions of lines</strong>, and <strong>dozens of parameters with varying types of headers and values</strong>. Therefore, manual fixing is not viable. There is a <strong>specific set of parameters</strong> with either <em><strong>whole</strong></em> or <em><strong>decimal</strong></em> <strong>numbers</strong> <strong>enclosed with curly brackets</strong>. And only these enclosed numbers together with the parameter headers needs attention.</p>
<p dir="auto">The following statements - mostly bad news - are true for these strings:</p>
<ul>
<li>May be found <strong>anywhere</strong> under the root</li>
<li>The pattern is <em>always</em> <strong>header={ N N }</strong>, originally indented but this RegEx can be done after removing the tabs</li>
<li>The are several <em>headers</em> and they are typically in lower case, though a few exceptions exist but case sensitivity is <strong>not</strong> necessary – RegEx <strong>\w+</strong> covers all of them</li>
<li>There may be <strong>any amount</strong> of these numbers: there is <strong>always</strong> at least <strong>one</strong> but may be <strong>up to several dozens</strong></li>
<li>Each <em>number</em> may be of <strong>varying length</strong>, typically <strong>anywhere between 1 and 1000000000 (1E+09)</strong></li>
<li>All numbers are <strong>positive</strong></li>
<li>Almost all of them are <strong>whole numbers</strong> but there are a couple cases where <strong>decimal numbers</strong> appear, typically in precision of 3 or 4 decimals… There is <strong>no</strong> pattern when and where they appear, so, ideally, our numbers should carry the structure of <strong>(\d+.?\d+?)</strong> in any case</li>
<li>Numbers are <strong>always</strong> separated with a <strong>regular space</strong></li>
<li>The <em>equal sign</em> <strong>=</strong> and <em>curly brackets</em> <strong>{ }</strong> should account for possible spacing errors with <strong>\h</strong>*</li>
<li><strong>Many other parameters have numbers</strong> but they are <strong>not</strong> enclosed with any brackets</li>
</ul>
<p dir="auto"><em>The exact matter is classified so I have randomized a dummy example for you. All parameters with curly brackets must be fixed.</em><br />
<strong>Please, note that in reality all these parameters are mixed and in any order, so they are <em>not</em> in separate sections like below. I just wanted to highlight them here so that it’s easier to see what needs to be done!</strong></p>
<p dir="auto">82={<br />
<em>— This section has parameters which need our attention to be fixed with RegEx —</em><br />
xx={ 16835961 }<br />
yyyy={ 16847062 67151971 74997 50388451 72836 83934207 50362874 16845543 81456 81771 67136455 33623075 16849442 100696613 82574 83286 83577 16852101 84199 33607712 }<br />
zzz={ 79199 16848761 83893799 70029 76217 16854401 16839 16853836 50370644 145057 79338 81773 16849133 83891875 }<br />
www={ 100693891 72513 16844226 33606062 16854968 16858108 33608429 16845608 67128408 33611952 50382602 67148972 67149505 50368894 78657 134238974 67119739 50362812 16833431 16852778 50353593 50378671 50383395 50386109 67120625 67126402 67136958 67145067 67145907 67151704 67158147 83897335 83898254 83921034 83921077 83927103 100681910 100691733 117474361 }<br />
pppp={ 50350929 168.36935 33589252 }<br />
rrrrr={ 322 482.865 }<br />
<em>— Other stuff in the file looks like this —</em><br />
info_about_this=blah<br />
header=85095<br />
Header=words_with_underlines<br />
date=1938.08.22<br />
that=2437<br />
dummy=funny<br />
}</p>
<p dir="auto"><em>Because of all these irregularities, combined with certain similarities with other parameters, all RegEx should be done preferably in one go… unless there is a foolproof solution with multiple steps that will not alter other parameters.</em></p>
<p dir="auto"><strong>TO DO</strong><br />
These strings of numbers must be parsed so that I can further process them. The following list explains the end result I need.</p>
<ul>
<li><strong>Separate line</strong> for <em>each number</em></li>
<li><strong>Header captured</strong> to be <em>included before each number</em>, i.e. <strong>01234</strong> → <strong>header=01234</strong></li>
<li><strong>Any</strong> in-line <strong>(white)spaces</strong> should be <em>removed</em>, including the ones before &amp; after numbers and brackets</li>
<li>The <strong>curly brackets</strong> are <strong>redundant</strong> so, ideally, they <em>should be removed</em> - I only need the headers and numbers</li>
<li><em>Everything beyond these strings should be kept intact - any changes <strong>will</strong> cause errors!</em></li>
</ul>
<p dir="auto"><em>The final product of the above dummy should look like the one below. Please, ignore the lines with “etc” - there are so many values that it’s best to abbreviate.</em></p>
<p dir="auto">100={<br />
xx=16835961<br />
yyyy=16847062<br />
yyyy=74997<br />
yyyy=50388451<br />
yyyy=728<br />
yyyy=83934207<br />
<em>…etc…</em><br />
zzz=<br />
zzz=79199<br />
zzz=16854401<br />
zzz=16839<br />
<em>…etc…</em><br />
pppp=<br />
pppp=50350929<br />
pppp=168.36935<br />
pppp=33589252<br />
<em>…etc…</em><br />
info_about_this=blah<br />
header=85095<br />
Header=words_with_underlines<br />
date=1938.08.22<br />
that=2437<br />
dummy=funny<br />
}</p>
<p dir="auto">Thank you for your time and patience! Any help would be incredibly helpful!<br />
Have a nice day!</p>
<p dir="auto"><em>P.S. I am really tired at the moment so I may have forgotten or mis-worded something so I may edit this post accordingly if anything peculiar is spotted…</em></p>
]]></description><link>https://community.notepad-plus-plus.org/topic/21484/regex-split-each-number-of-a-string-inside-curly-brackets-into-a-separate-line-add-a-prefix-to-it-remove-all-unnecessary-data</link><generator>RSS for Node</generator><lastBuildDate>Fri, 10 Apr 2026 13:05:46 GMT</lastBuildDate><atom:link href="https://community.notepad-plus-plus.org/topic/21484.rss" rel="self" type="application/rss+xml"/><pubDate>Wed, 14 Jul 2021 12:51:44 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to RegEx: Split each number of a string inside curly brackets into a separate line, add a prefix to it &amp; remove all unnecessary data on Fri, 16 Jul 2021 16:19:54 GMT]]></title><description><![CDATA[<p dir="auto">Hi, <a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/22541">@grimaldas-grydas</a> and <strong>All</strong>,</p>
<p dir="auto">To begin with, let’s me explain the <strong>general</strong> method used. we’re going to use a <strong>short</strong> line, from your <em>INPUT</em> text, which must be processed :</p>
<pre><code class="language-diff">pppp={ 50350929 168.36935 33589252 }
</code></pre>
<p dir="auto">The goal is to write the <strong>three</strong> numbers <strong><code>50350929</code></strong>, <strong><code>168.36935</code></strong> and <strong><code>33589252</code></strong> , <strong>each</strong> one on a <strong>different</strong> line, and <strong>prefixed</strong> with the string <strong><code>pppp</code></strong>, located <strong>before</strong> the <strong><code>=</code></strong> sign, in order to get :</p>
<pre><code class="language-diff">pppp=50350929
pppp=168.36935
pppp=33589252
</code></pre>
<p dir="auto">The <strong>problem</strong> is that when the regex engine catches, <strong>successively</strong>, each <strong>number</strong>, it does <strong>not</strong> know anymore the <strong><code>pppp</code></strong> string, located at the <strong>beginning</strong> of <strong>current</strong> line !</p>
<p dir="auto">So my idea was to <strong>swap</strong> the list of <strong>numbers</strong> and the string <strong><code>pppp</code></strong> before the <strong>equal</strong> sign and separate these <strong>two</strong> ranges with a <strong>temporary</strong> char, <strong>not</strong> present in your data !</p>
<p dir="auto">So, after a <strong>first</strong> regex S/R, we get the <strong>temporary</strong> text, below :</p>
<pre><code class="language-diff"> 50350929 168.36935 33589252¤pppp
</code></pre>
<p dir="auto">With this <strong>new</strong> layout, when the regex engine matches a number ( <strong>integer</strong> / <strong>decimal</strong> ) it is fairly easy, with a <strong>look-head</strong> structure, to <strong>store</strong>, at each time, the string <strong>after</strong> the <strong>temporary</strong> <strong><code>¤</code></strong> char, ending the <strong>current</strong> line !</p>
<p dir="auto">Then, with a <strong>second</strong> regex S/R, we finally get our <strong>expected</strong> text :</p>
<pre><code class="language-diff">pppp=50350929
pppp=168.36935
pppp=33589252
</code></pre>
<hr />
<p dir="auto">Before we get into the <strong>details</strong>, it is <em>IMPORTANT</em> to point out that I found out a case where my <strong>previous</strong> regex S/R did <strong>not</strong> work ! So, you’ll have to use the <strong>second</strong> version, below !</p>
<p dir="auto">The <strong>complete</strong> regex S/R, where I added the <strong><code>\h*</code></strong> part that you mentioned and where I <strong>fixed</strong> the bug, is :</p>
<ul>
<li>
<p dir="auto">SEARCH <strong><code>(?-s)^\h*(\w+)={(.+)\h+}$|(^)?\h+(\d+(?:\.\d+)?)(?=.*¤(\w+))|¤.+</code></strong></p>
</li>
<li>
<p dir="auto">REPLACE <strong><code>(?2\2¤\1)?4(?3:\r\n)\5=\4</code></strong></p>
</li>
</ul>
<p dir="auto">can be <strong>split</strong> into <strong><code>2</code></strong> <strong>consecutive</strong> regex S/R, which are completely <strong>independent</strong> :</p>
<ul>
<li>
<p dir="auto">The Search/Replacement <strong><code>A</code></strong>, which creates the <strong>intermediate</strong> text :</p>
<ul>
<li>
<p dir="auto">SEARCH <strong><code>(?-s)^\h*(\w+)={(.+)\h+}$</code></strong></p>
</li>
<li>
<p dir="auto">REPLACE <strong><code>?2\2¤\1</code></strong></p>
</li>
</ul>
</li>
<li>
<p dir="auto">The Search/Replacement <strong><code>B</code></strong>, which gets the <strong>expected</strong> and final text</p>
<ul>
<li>
<p dir="auto">SEARCH <strong><code>(?-s)(^)?\h+(\d+(?:\.\d+)?)(?=.*¤(\w+))|¤.+</code></strong></p>
</li>
<li>
<p dir="auto">REPLACE <strong><code>?4(?3:\r\n)\5=\4</code></strong></p>
</li>
</ul>
</li>
</ul>
<p dir="auto">The groups, defined by the <strong><code>A</code></strong> and <strong><code>B</code></strong> <strong>search</strong> regexes are :</p>
<pre><code class="language-z">
(?x-s) ^ \h* (\w+) = { (.+) \h+ } $
              ¯¯¯       ¯¯
              Gr 1     Gr 2


(?x-s) (^)? \h+ ( \d+(?: \. \d+ )? ) (?= .* ¤ (\w+) ) | ¤ .+
        ¯         ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯             ¯¯¯ 
      Gr 3              Gr 4                  Gr 5	
</code></pre>
<p dir="auto">Note, that I use the <strong>free-spacing</strong> mode <strong><code>(?x)</code></strong> for a <strong>better</strong> readability and each regex contains the <strong><code>(?-s)</code></strong> in-line <strong>modifier</strong> which means that any regex <strong><code>.</code></strong> char will match a <strong>single standard</strong> character ( not <strong>EOL</strong> ones )</p>
<ul>
<li>
<p dir="auto">In <strong>search</strong> regex <strong><code>A</code></strong> :</p>
<ul>
<li>
<p dir="auto">The part <strong><code>^\h*(\w+)=</code></strong> matches the <strong>word</strong> string, stored as <strong>group <code>1</code></strong>, after <strong>possible</strong> leading <strong>blank</strong> chars, till an <strong><code>=</code></strong> character</p>
</li>
<li>
<p dir="auto">The part <strong><code>{(.+)\h+}$</code></strong> matches a literal <strong><code>{</code></strong> char, then any <strong>non-null</strong> range of chars, each <strong>number</strong> preceded with space(s), which is stored as <strong>group <code>2</code></strong>, till <strong>space</strong> char(s) and a closing <strong><code>}</code></strong> char, ending the <strong>current</strong> line</p>
</li>
</ul>
</li>
<li>
<p dir="auto">In replacement regex <strong><code>A</code></strong> :</p>
<ul>
<li><strong><code>?2\2¤\1</code></strong>, which should be <strong>exactly</strong> expressed as <strong><code>(?2\2¤\1)</code></strong>, is a <strong>conditional</strong> replacement syntax, which means that <em>IF</em> <strong>group <code>2</code></strong> exists, it must rewrite the <strong>group <code>2</code></strong> first, <strong><code>\2</code></strong>( i.e. the <strong>numbers</strong> only ), then the <strong>literal</strong> char <strong><code>¤</code></strong> and finally <strong>group <code>1</code></strong> ( the string <strong><code>pppp</code></strong> )</li>
<li></li>
</ul>
</li>
<li>
<p dir="auto">Now, the <strong>search</strong> regex <strong><code>B</code></strong> contains <strong>two</strong> alternatives :</p>
<ul>
<li>
<p dir="auto">The <strong>first</strong> alternative <strong><code>(?-s)(^)?\h+(\d+(?:\.\d+)?)(?=.*¤(\w+))</code></strong></p>
<ul>
<li>
<p dir="auto">The <strong>middle</strong> part <strong><code>(\d+(?:\.\d+)?)</code></strong> matches any <strong>integer</strong> or <strong>decimal</strong> number, which is stored as <strong>group <code>4</code></strong>. Note the <strong>optional non-capturing</strong> group <strong><code>(?:\.\d+)?</code></strong> in the case of a <strong>decimal</strong> number</p>
</li>
<li>
<p dir="auto">The <strong>first</strong> part <strong><code>(^)?\h+</code></strong> matches matches the <strong>blank</strong> char(s), <strong>preceding</strong> a number. Remark that, if the <strong>leading blank</strong> char(s) begins <strong>current</strong> line, the <strong>optional</strong> group <strong><code>3</code></strong>, <strong><code>(^)?</code></strong>, is then <strong>defined</strong></p>
</li>
<li>
<p dir="auto">The <strong>final</strong> part <strong><code>(?=.*¤(\w+))</code></strong>, is a <strong>look-ahead</strong> structure, <strong>not</strong> included in the final match, but which must be <strong>true</strong> in order to get an <strong>effective</strong> match. So <strong>current</strong> matched <strong>number</strong> must be followed by a range, possibly <strong>null</strong>, of characters till the <strong>temporary</strong> char <strong><code>¤</code></strong> and the ending string <strong><code>pppp</code></strong></p>
</li>
</ul>
</li>
<li>
<p dir="auto">The <strong>second</strong> alternative <strong><code>¤.+</code></strong>, which is used when <strong>current</strong> parsing position of the regex engine is at the <strong><code>¤</code></strong> location, <strong>after</strong> the processed numbers. This <strong>second</strong> alternative, <strong>without</strong> any group, simply matches the temporary <strong><code>¤</code></strong> char and <strong>all subsequent</strong> chars of <strong>current</strong> line, and should be <strong>deleted</strong> in replacement !</p>
</li>
</ul>
</li>
<li>
<p dir="auto">In replacement regex <strong><code>B</code></strong> :</p>
<ul>
<li>
<p dir="auto"><strong><code>?4(?3:\r\n)\5=\4</code></strong>, which should be <strong>exactly</strong> expressed as <strong><code>(?4(?3:\r\n)\5=\4)</code></strong>, means that, <em>IF</em> <strong>group<code>4</code></strong> exists ( the <strong>numbers</strong> ), it must :</p>
<ul>
<li>
<p dir="auto">Execute, first, the <strong><code>(?3:\r\n)</code></strong> <strong>conditional</strong> replacement. This replacement does not include a <em>THEN</em> part and, <strong>only</strong>, the regex <strong><code>\r\n</code></strong> as an <em>ELSE</em> part, after the <strong><code>:</code></strong> char. So, this means that if <strong>group <code>3</code></strong> does <strong>not</strong> exist ( number <strong>not</strong> at <strong>beginning</strong> of current line ) , it must insert a leading <strong>line-break</strong> !</p>
</li>
<li>
<p dir="auto">Write the <strong>group <code>5</code></strong>, <strong><code>\5</code></strong>, followed with a <strong>literal</strong> <strong><code>=</code></strong> sign</p>
</li>
<li>
<p dir="auto">Finally, write the <strong>group <code>4</code></strong> ( <strong>current</strong> number matched by the <strong>first</strong> alternative of <strong>search</strong> regex <strong><code>B</code></strong> )</p>
</li>
</ul>
</li>
<li>
<p dir="auto">Note that, when matching the <strong>second</strong> alternative <strong><code>¤.+</code></strong> of the <strong>search</strong> regex <strong><code>B</code></strong>, at end of <strong>current</strong> line, <strong>group <code>4</code></strong> is <strong>not</strong> defined. So, <strong>no</strong> action occurs in replacement. Thus, concretely, this means that the string <strong><code>¤pppp</code></strong> is <strong>deleted</strong> !</p>
</li>
</ul>
</li>
</ul>
<hr />
<p dir="auto"><strong>Remarks</strong> :</p>
<ul>
<li>
<p dir="auto">The S/R <strong><code>A</code></strong> and <strong><code>B</code></strong> are <strong>independent</strong>. As a demonstration :</p>
<ul>
<li>
<p dir="auto">When executing, <strong>first</strong>, the <strong>search</strong> regex <strong><code>A</code></strong>, as no <strong><code>¤</code></strong> character <strong>already</strong> exists, <strong>each</strong> alternative of the search regex <strong><code>B</code></strong> <strong>cannot</strong> match</p>
</li>
<li>
<p dir="auto">When executing, in a <strong>second</strong> time, the search regex <strong><code>B</code></strong>, as the <strong>intermediate</strong> text ( after running <strong><code>A</code></strong> ) does <strong>not</strong> contain any <strong><code>{</code></strong> nor <strong><code>}</code></strong> characters, obviously, the search regex <strong><code>A</code></strong> <strong>cannot</strong> match, too !</p>
</li>
</ul>
</li>
</ul>
<p dir="auto">Thus, we can <strong>merge</strong> these <strong>two successive</strong> S/R in <strong>one</strong> regex S/R only ! You’ll note that :</p>
<ul>
<li>
<p dir="auto">The <strong>redundant</strong> part <strong><code>(?-s)</code></strong>, at <strong>beginning</strong> of regex S/R <strong><code>B</code></strong>, is <strong>omitted</strong></p>
</li>
<li>
<p dir="auto">The replacement of S/R <strong><code>A</code></strong>, <strong><code>?2\2¤\1</code></strong>, must be <strong>enclosed</strong> between <strong>parentheses</strong>, <strong><code>(?2\2¤\1)</code></strong>, in order to <strong>not</strong> include the <strong>replacement</strong> section of S/R <strong><code>B</code></strong></p>
</li>
</ul>
<p dir="auto">As a conclusion, the <strong>complete</strong> regex S/R, with the <strong>free-spacing</strong> mode in the <strong>search</strong> part, is :</p>
<ul>
<li>
<p dir="auto">SEARCH <strong><code>(?x-s) ^ \h* ( \w+ ) = { ( .+ ) \h+ } $ | (^)? \h+ ( \d+ (?:\.\d+)? ) (?= .* ¤ ( \w+ ) ) | ¤ .+</code></strong></p>
</li>
<li>
<p dir="auto">REPLACE <strong><code>(?2\2¤\1)?4(?3:\r\n)\5=\4</code></strong></p>
</li>
</ul>
<p dir="auto">And outputs the <strong>expected</strong> text, after <strong>two consecutive</strong> clicks on the <strong><code>Replace All</code></strong> button !</p>
<hr />
<p dir="auto">As mentioned in my <strong>last</strong> post, if we try to click a <strong>third</strong> time on the <strong><code>Replace All</code></strong> button, <strong>luckily</strong>, nothing else occurs ! Why ? Easy : as <strong>brace</strong> <strong><code>{</code></strong> or <strong><code>}</code></strong> characters nor <strong><code>¤</code></strong> character exists in our <strong>final</strong> text, any <strong>alternative</strong> of the overall regex <strong>cannot</strong> match. Logical  ;-))</p>
<p dir="auto">I just hope, <a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/22541">@grimaldas-grydas</a>, that these <strong>explanations</strong> help you a bit !</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/67989</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/67989</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Fri, 16 Jul 2021 16:19:54 GMT</pubDate></item><item><title><![CDATA[Reply to RegEx: Split each number of a string inside curly brackets into a separate line, add a prefix to it &amp; remove all unnecessary data on Fri, 16 Jul 2021 05:11:23 GMT]]></title><description><![CDATA[<p dir="auto">Also, there’s no rush, take your time, everyone! I’m sorry if I sounded rushing. I was just trying to write down all while I remembered.</p>
<p dir="auto">Thank you for your help, everyone!</p>
]]></description><link>https://community.notepad-plus-plus.org/post/67963</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/67963</guid><dc:creator><![CDATA[Grimaldas Grydas]]></dc:creator><pubDate>Fri, 16 Jul 2021 05:11:23 GMT</pubDate></item><item><title><![CDATA[Reply to RegEx: Split each number of a string inside curly brackets into a separate line, add a prefix to it &amp; remove all unnecessary data on Fri, 16 Jul 2021 03:05:57 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/195">@guy038</a><br />
Sorry for multiple replies (again)! I forgot to ask, if it is not trouble, could you please explain your RegEx? These kinds of cases are beyond my current understanding and I’m really interested in learning and improving my skills! Also, this is an unusual and complex case, so someone else could find this useful as well!</p>
<p dir="auto">Thank you again, for your help, time and patience! :-)</p>
]]></description><link>https://community.notepad-plus-plus.org/post/67960</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/67960</guid><dc:creator><![CDATA[Grimaldas Grydas]]></dc:creator><pubDate>Fri, 16 Jul 2021 03:05:57 GMT</pubDate></item><item><title><![CDATA[Reply to RegEx: Split each number of a string inside curly brackets into a separate line, add a prefix to it &amp; remove all unnecessary data on Fri, 16 Jul 2021 02:49:17 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/195"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/195">@guy038</a></a><br />
Thank you so, so much for this! Your RegEx is doing exactly what I needed! I only did a small modification to it to permit matches with indents. I have yet to check how foolproof it is in the long run and whether it would be suitable for other, similar cases in other files I’m working on, but so far it is working perfectly!</p>
<p dir="auto">To be exact, it works perfectly when it is done at a specific stage among a couple dozen other RegEx steps needed for this file, at the point when all other, less problematic cases of “xxx={yyyy}” strings have been fixed, leaving only those behind which need this specific step. However, that is not a problem at all - RegEx works in a way which requires specific order of steps sometimes, and even more so when there is higher complexity involved. In my projects it happens frequently, so I have to do a lot of trial and error to figure out the correct order of replaces. Moreover, these sorts of projects are incredibly interesting for me!</p>
<p dir="auto">In case anyone needs the version I used with indent included - I simply added <code>\h*</code> after <code>^</code>:<br />
<code>(?-s)^\h*(\w+)={(.+)\h+}$|(^)?\h+(\d+(?:\.\d+)?)(?=.+¤(\w+))|¤.+</code></p>
<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/12335">@Terry-R</a><br />
Thank you so much for your version as well! It seems to be working as well, though it is less stable and higher maintenance than the one by <a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/195"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/195">@guy038</a></a>. However, it is still very useful as it has given me ideas and solutions for several other RegEx I am using for these files, so thank you!</p>
<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/7377">@Alan-Kilborn</a><br />
I think there is no need to dwell on that matter. There was no harm done whatsoever. How we perceive things is highly individual and biased, depending on the culture, personality and so on. In this case ‘rude’ is a bit extreme wording, hence I added “-ish” there. That comment of mine referred chiefly to the last phrase “I wouldn’t want to even take a stab at an answer yet.”, and the clearly annoyed ‘tone’ because of merely forgetting to add specific markup. Although incredibly helpful for readers, one could phrase such issues more politely.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/67959</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/67959</guid><dc:creator><![CDATA[Grimaldas Grydas]]></dc:creator><pubDate>Fri, 16 Jul 2021 02:49:17 GMT</pubDate></item><item><title><![CDATA[Reply to RegEx: Split each number of a string inside curly brackets into a separate line, add a prefix to it &amp; remove all unnecessary data on Thu, 15 Jul 2021 12:17:55 GMT]]></title><description><![CDATA[<p dir="auto">Hello, <a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/22541">@grimaldas-grydas</a>, <a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/3841">@peterjones</a>, <a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/12335">@terry-r</a>, <a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/7377">@alan-kilborn</a> and <strong>All</strong>,</p>
<p dir="auto">Here is my solution : A <strong>single</strong> regex S/R will be <strong>enough</strong>, but you’ll need to click <strong>twice</strong> on the <strong><code>Replace All</code></strong> button !</p>
<p dir="auto">I also had to use a <strong>temporary</strong> character, <strong>absent</strong> in <strong>all</strong> your data ! I chose the <strong><code>¤</code></strong> character, present on my <strong>French</strong> keyboard. But you may adopt any <strong>simple</strong> character which is <strong>not present</strong> in your current file, as for instance, <strong><code>@</code></strong>, <strong><code>&amp;</code></strong>, <strong><code>%</code></strong>, <strong><code>§</code></strong>, …  :-)</p>
<p dir="auto">So if we consider your <em>INPUT</em> text :</p>
<pre><code class="language-diff">82={ # This is the root, used for each main entry. All parameters are placed under it. In this case, these are safe to ignore.
### The section below has parameters which need our attention to be fixed with RegEx ###
xx={ 16835961 }
yyyy={ 16847062 67151971 74997 50388451 72836 83934207 50362874 16845543 81456 81771 67136455 33623075 16849442 100696613 82574 83286 83577 16852101 84199 33607712 }
zzz={ 79199 16848761 83893799 70029 76217 16854401 16839 16853836 50370644 145057 79338 81773 16849133 83891875 }
www={ 100693891 72513 16844226 33606062 16854968 16858108 33608429 16845608 67128408 33611952 50382602 67148972 67149505 50368894 78657 134238974 67119739 50362812 16833431 16852778 50353593 50378671 50383395 50386109 67120625 67126402 67136958 67145067 67145907 67151704 67158147 83897335 83898254 83921034 83921077 83927103 100681910 100691733 117474361 }
pppp={ 50350929 168.36935 33589252 }
rrrrr={ 322 482.865 }
### Other stuff in the file looks like this ###
info_about_this=blah
header=85095
Header=words_with_underlines
date=1938.08.22
that=2437
dummy=funny
}
</code></pre>
<ul>
<li>
<p dir="auto">Now, open the <strong><code>Replace</code></strong> dialog ( <strong><code>Ctrl + H</code></strong> )</p>
<ul>
<li>
<p dir="auto">SEARCH <strong><code>(?-s)^(\w+)={(.+)\h+}$|(^)?\h+(\d+(?:\.\d+)?)(?=.+¤(\w+))|¤.+</code></strong></p>
</li>
<li>
<p dir="auto">REPLACE <strong><code>(?2\2¤\1)?4(?3:\r\n)\5=\4</code></strong></p>
</li>
<li>
<p dir="auto"><strong>Tick</strong> the <strong><code>Wrap around</code></strong> option</p>
</li>
<li>
<p dir="auto"><strong>Un</strong>-tick <strong>all</strong> other options</p>
</li>
<li>
<p dir="auto">Click <em>ONCE</em>, only, on the <strong><code>Replace All</code></strong> button</p>
</li>
</ul>
</li>
</ul>
<p dir="auto">=&gt; You should get this <strong>intermediate</strong> text:</p>
<pre><code class="language-diff">82={ # This is the root, used for each main entry. All parameters are placed under it. In this case, these are safe to ignore.
### The section below has parameters which need our attention to be fixed with RegEx ###
 16835961¤xx
 16847062 67151971 74997 50388451 72836 83934207 50362874 16845543 81456 81771 67136455 33623075 16849442 100696613 82574 83286 83577 16852101 84199 33607712¤yyyy
 79199 16848761 83893799 70029 76217 16854401 16839 16853836 50370644 145057 79338 81773 16849133 83891875¤zzz
 100693891 72513 16844226 33606062 16854968 16858108 33608429 16845608 67128408 33611952 50382602 67148972 67149505 50368894 78657 134238974 67119739 50362812 16833431 16852778 50353593 50378671 50383395 50386109 67120625 67126402 67136958 67145067 67145907 67151704 67158147 83897335 83898254 83921034 83921077 83927103 100681910 100691733 117474361¤www
 50350929 168.36935 33589252¤pppp
 322 482.865¤rrrrr
### Other stuff in the file looks like this ###
info_about_this=blah
header=85095
Header=words_with_underlines
date=1938.08.22
that=2437
dummy=funny
}
</code></pre>
<p dir="auto">Now, click a <em>SECOND</em> time on the <strong><code>Replace All</code></strong> button</p>
<p dir="auto">=&gt; And here is your <strong>expected</strong> <em>OUTPUT</em> text :</p>
<pre><code class="language-diff">82={ # This is the root, used for each main entry. All parameters are placed under it. In this case, these are safe to ignore.
### The section below has parameters which need our attention to be fixed with RegEx ###
xx=16835961
yyyy=16847062
yyyy=67151971
yyyy=74997
yyyy=50388451
yyyy=72836
yyyy=83934207
yyyy=50362874
yyyy=16845543
yyyy=81456
yyyy=81771
yyyy=67136455
yyyy=33623075
yyyy=16849442
yyyy=100696613
yyyy=82574
yyyy=83286
yyyy=83577
yyyy=16852101
yyyy=84199
yyyy=33607712
zzz=79199
zzz=16848761
zzz=83893799
zzz=70029
zzz=76217
zzz=16854401
zzz=16839
zzz=16853836
zzz=50370644
zzz=145057
zzz=79338
zzz=81773
zzz=16849133
zzz=83891875
www=100693891
www=72513
www=16844226
www=33606062
www=16854968
www=16858108
www=33608429
www=16845608
www=67128408
www=33611952
www=50382602
www=67148972
www=67149505
www=50368894
www=78657
www=134238974
www=67119739
www=50362812
www=16833431
www=16852778
www=50353593
www=50378671
www=50383395
www=50386109
www=67120625
www=67126402
www=67136958
www=67145067
www=67145907
www=67151704
www=67158147
www=83897335
www=83898254
www=83921034
www=83921077
www=83927103
www=100681910
www=100691733
www=117474361
pppp=50350929
pppp=168.36935
pppp=33589252
rrrrr=322
rrrrr=482.865
### Other stuff in the file looks like this ###
info_about_this=blah
header=85095
Header=words_with_underlines
date=1938.08.22
that=2437
dummy=funny
}
</code></pre>
<p dir="auto">The <strong>nice</strong> thing is that is you try to click a <em>THIRD</em> time, on the <strong>Replace All</strong> button, <strong>nothing</strong> else occurs ;-))</p>
<p dir="auto">I must be out a <strong>couple</strong> of hours ! See you later for <strong>possible</strong> modifications and <strong>explanations</strong> on this regex S/R !</p>
<p dir="auto">Best regards,</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/67928</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/67928</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Thu, 15 Jul 2021 12:17:55 GMT</pubDate></item><item><title><![CDATA[Reply to RegEx: Split each number of a string inside curly brackets into a separate line, add a prefix to it &amp; remove all unnecessary data on Thu, 15 Jul 2021 11:14:08 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/22541">@Grimaldas-Grydas</a> said in <a href="/post/67894">RegEx: Split each number of a string inside curly brackets into a separate line, add a prefix to it &amp; remove all unnecessary data</a>:</p>
<blockquote>
<p dir="auto">I think you didn’t have to be rude-ish to me,</p>
</blockquote>
<p dir="auto">Hmm, I read it over and I didn’t see even a hint of rude-ish-ness in what Peter said.  Maybe he was “direct” but certainly in no way “rude”.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/67921</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/67921</guid><dc:creator><![CDATA[Alan Kilborn]]></dc:creator><pubDate>Thu, 15 Jul 2021 11:14:08 GMT</pubDate></item><item><title><![CDATA[Reply to RegEx: Split each number of a string inside curly brackets into a separate line, add a prefix to it &amp; remove all unnecessary data on Wed, 14 Jul 2021 21:20:19 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/22541">@Grimaldas-Grydas</a> said in <a href="/post/67880">RegEx: Split each number of a string inside curly brackets into a separate line, add a prefix to it &amp; remove all unnecessary data</a>:</p>
<blockquote>
<p dir="auto">Because of all these irregularities, combined with certain similarities with other parameters, all RegEx should be done preferably in one go… unless there is a foolproof solution with multiple steps that will not alter other parameters.</p>
</blockquote>
<p dir="auto">I can’t see how it’s possible to do it with 1 regex. The primary issue is when processing the numbers inside the <code>{}</code> you cannot look behind with a variable length to find the <code>xxx=</code> to copy ahead for the next number found. So instead I think 2 regex will suffice. The first moves the <code>xxx=</code> to the end of the line as the look ahead can be of variable length. The second regex then completes the transformation.</p>
<p dir="auto">So the first regex to move the header to end of line (and remove indentation?) is:<br />
Find What:<code>(?-s)^\h*(\w+=)(\{.+\})</code><br />
Replace With:<code>\2\1</code></p>
<p dir="auto">The second regex will now copy the header by looking forward and capturing it for each number it encounters and rewrites that as a separate line. When it cannot find any more numbers on the line it will instead find the <code>}xxx=</code> sequence which it promptly deletes along with the line break. So we have<br />
Find What:<code>(?-s)(?:\{?\h*)?(\d+(?:\.\d+)?)(?=[^}\r\n]+}(\w+=))|\h*\}\w+=\R</code><br />
Replace With:<code>(?1\2\1\r\n)</code><br />
Please note that although I included <code>(?-s)</code> in the second regex it is in fact redundant as there are no <code>.</code> references made. It is something I strive to do when starting to compile a solution and sometimes I just leave it there even if not needed.</p>
<p dir="auto">Now this definitely works (tested) with the small (non-indented) sample you provided in your 3rd posting, however since you made reference to possible indentation it is likely you may still need to change my regex. Note my first regex does attempt to remove the indentation, but I will leave it up to you if that’s successful before applying the second regex.</p>
<p dir="auto">Given the complexity of your data and issues around other lines that <code>look similar</code> this may not be the final solution, but rather a work in progress. Please do come back to us with the <code>edge cases</code> as <a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/3841">@PeterJones</a> mentions. His <em><strong>italicised</strong></em> text at the bottom of his first post here contains very important information.</p>
<p dir="auto">Terry</p>
]]></description><link>https://community.notepad-plus-plus.org/post/67907</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/67907</guid><dc:creator><![CDATA[Terry R]]></dc:creator><pubDate>Wed, 14 Jul 2021 21:20:19 GMT</pubDate></item><item><title><![CDATA[Reply to RegEx: Split each number of a string inside curly brackets into a separate line, add a prefix to it &amp; remove all unnecessary data on Wed, 14 Jul 2021 16:49:57 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/22541">@Grimaldas-Grydas</a> ,</p>
<p dir="auto">You may have to be patient.  I’m pretty busy right now, so cannot look into it.  But there are other regex experts who usually visit at least once per day.  Hopefully, one of them will be able to look into it.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/67902</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/67902</guid><dc:creator><![CDATA[PeterJones]]></dc:creator><pubDate>Wed, 14 Jul 2021 16:49:57 GMT</pubDate></item><item><title><![CDATA[Reply to RegEx: Split each number of a string inside curly brackets into a separate line, add a prefix to it &amp; remove all unnecessary data on Wed, 14 Jul 2021 16:46:56 GMT]]></title><description><![CDATA[<h2>There are a couple more things worth mentioning</h2>
<p dir="auto">I have tested numerous ways, back and forth, including with tester like regex101. The problem appears to be that I’m terrible with lookups and conditionals, so I just cannot wrap my mind around a possible solution. I also have a feeling that the entire process may be impossible with Notepad++ RegEx but some parts could be done. This is why I sought for help in the first place - many of you are far more skilled and may actually know of a solution!</p>
<p dir="auto">The original post is complex - I wanted to include all possible information there. However, what needs to be done, is really simple:</p>
<ol>
<li>We have a string, like <strong>header={ 01 0345 0647889 0887 }</strong></li>
<li>We need to capture the part <strong>header=</strong></li>
<li>Each number inside brackets (in this case: <strong>01</strong>, <strong>0345</strong>, <strong>0647889</strong> and <strong>0887</strong>), needs to be split into a separate line</li>
<li>End result should omit all brackets and spaces</li>
<li>One of many lines of the end result should look like <strong>header=0345</strong></li>
</ol>
<p dir="auto">So far, I have a partial solution. I can easily split numbers into separate lines with SEARCH <code>\h+(\d+\.*\d*)</code> &amp; REPLACE <code>\r\n$1</code>. However, this solution is vulnerable to errors because many other parameters have similar numbers as well. The original matter has spaces as well but I have done this at later stage so that only these have spaces. There are literally thousands of these parameters, and exponentially more with the numbers split, so doing this manually is prone to errors and practically not viable.</p>
<p dir="auto">However, this still ignores the brackets, which are the most crucial “identifier” of this case, and the numbers lack a header, which would be imperative to include as there are several parameters using similar numbers. This is why I am asking for help. I would like to know if it would be possible to do the whole process described. English is not my first language (it’s Finnish), so it’s a little difficult to explain this properly…</p>
<p dir="auto">It would be piece of cake if the numbers were of regular length, there was a fixed amount of numbers or there was a clear pattern overall. The real difficulty is in this irregularity, as numerous variations must be taken into account… I tried solutions like SEARCH <code>\h+(?:(\w+)\h*=\h*{)\(?:\h+(\d+\.*\d*))</code> → REPLACE <code>$1=$2</code> but this only matches the first number out of many, instead of matching all numbers until the closing curly bracket.</p>
<p dir="auto">In any case, I am sorry for bothering you all.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/67901</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/67901</guid><dc:creator><![CDATA[Grimaldas Grydas]]></dc:creator><pubDate>Wed, 14 Jul 2021 16:46:56 GMT</pubDate></item><item><title><![CDATA[Reply to RegEx: Split each number of a string inside curly brackets into a separate line, add a prefix to it &amp; remove all unnecessary data on Wed, 14 Jul 2021 16:02:40 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/3841">@PeterJones</a></p>
<p dir="auto">Thank you for your help this far, anyway! Any responses and comments are always welcome and valuable!</p>
<p dir="auto">Concerning indentation, I had no idea it was of any importance in this case! So far, all RegEx I needed for these files could either safely ignore them entirely or could be simply marked with <code>\h*</code> or <code>\h+</code>, occasionally augmented with <code>^</code> when beginning was necessary for the code.</p>
<p dir="auto">The indentation here is really simple, however. The root entry (82={} in this case) is at level 0, on the margin. Most entries for this case are one TAB in, and a couple rare ones are two TABs in. I thought I could simply add these into the code myself after figuring out how to solve the main issue.</p>
<p dir="auto">The end result should have no indentation, so most steps for these files involve methodical removal of any differing “layout”. The end result should be basically as plain as .csv, or .json or similar data typically is.</p>
<p dir="auto">I feel it’s not necessary knowledge for this matter, and due to classified information, I cannot tell anything specific… but as mentioned before, I am further processing this into something human-readable. My methods vary from one case to another, but most can be done with a program like Excel, where information can be easily databased, modified with formulae etc.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/67898</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/67898</guid><dc:creator><![CDATA[Grimaldas Grydas]]></dc:creator><pubDate>Wed, 14 Jul 2021 16:02:40 GMT</pubDate></item><item><title><![CDATA[Reply to RegEx: Split each number of a string inside curly brackets into a separate line, add a prefix to it &amp; remove all unnecessary data on Wed, 14 Jul 2021 15:52:20 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/22541">@Grimaldas-Grydas</a> said in <a href="/post/67894">RegEx: Split each number of a string inside curly brackets into a separate line, add a prefix to it &amp; remove all unnecessary data</a>:</p>
<blockquote>
<p dir="auto">However, I think you didn’t have to be rude-ish to me, especially because judging by your reply, it is merely because of overlooking one (though important) markup.</p>
</blockquote>
<p dir="auto">Sorry, I was not intending to be rude.  I cannot hazard a guess at a regex that might work, because your examples don’t show indenting, and your explanation implies there is indentation; if we cannot see where it is, our solutions will likely <em>not</em> work if our guess at indentation doesn’t match yours.</p>
<p dir="auto">I see that while I was typing up this reply, you re-posted the data.  Thank you.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/67897</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/67897</guid><dc:creator><![CDATA[PeterJones]]></dc:creator><pubDate>Wed, 14 Jul 2021 15:52:20 GMT</pubDate></item><item><title><![CDATA[Reply to RegEx: Split each number of a string inside curly brackets into a separate line, add a prefix to it &amp; remove all unnecessary data on Wed, 14 Jul 2021 15:49:48 GMT]]></title><description><![CDATA[<h2>Apparently, I am only allowed to edit for three minutes after posting, so I cannot edit the original post… Let me know if I should start a new topic/thread and discard this one entirely. For now, below is the original post with fixed markup and some edits. Sorry for messing up the markup!</h2>
<p dir="auto">Hello everyone!</p>
<p dir="auto">I need help with annoyingly tricky RegEx that I just can’t figure out! Maybe there is <strong>no</strong> solution after all? The nearly complete lack of any pattern is a huge problem for this RegEx. I have been racking my brain for hours on this, testing countless approaches but my skills simply aren’t enough for this… so I would really appreciate any help you can provide! In advance, many, many thanks for any help!</p>
<p dir="auto">Anyway, I have a huge database of <strong>dozens</strong> of files, each containing typically <strong>millions of lines</strong>, and <strong>dozens of parameters with varying types of headers and values</strong>. Therefore, manual fixing is not viable. There is a <strong>specific set of parameters</strong> with either <em><strong>whole</strong></em> or <strong><em>decimal</em> numbers enclosed with curly brackets</strong>. And only these enclosed numbers together with the parameter headers needs attention.</p>
<p dir="auto">The following statements - mostly bad news - are true for these strings:</p>
<ul>
<li>May be found <strong>anywhere</strong> under the root</li>
<li>The pattern is always <strong>header={ N N }</strong>, originally indented but this RegEx can be done after removing the tabs</li>
<li>The are several headers and they are typically in lower case, though a few exceptions exist but case sensitivity is not necessary – RegEx <code>\w+</code> covers all of them</li>
<li>There may be <strong>any amount</strong> of these numbers: there is <strong>always</strong> at least <strong>one</strong> but may be <strong>up to several dozens</strong></li>
<li>Each number may be of <strong>varying length</strong>, typically <strong>anywhere between 1 and 1000000000 (1E+09)</strong></li>
<li>All numbers are <strong>positive</strong></li>
<li>Almost all of them are <strong>whole numbers</strong> but there are a couple cases where <strong>decimal numbers</strong> appear, typically in precision of 3 or 4 decimals… There is no pattern when and where they appear, so, ideally, our numbers should carry the structure of <code>(\d+.?\d+?)</code> in any case</li>
<li>Numbers are <strong>always</strong> separated with a <strong>regular space</strong></li>
<li>The equal sign = and curly brackets { } should account for possible spacing errors with <code>\h*</code></li>
<li><strong>Many other parameters have numbers</strong> but they are <strong>not</strong> enclosed with any brackets</li>
</ul>
<p dir="auto"><strong>The exact matter is classified so I have randomized a dummy example for you. All parameters with curly brackets must be fixed.</strong><br />
<em>Please, note that in reality all these parameters are mixed and in any order, so they are not in separate sections like below. I just wanted to highlight them here so that it’s easier to see what needs to be done!</em></p>
<pre><code>82={ # This is the root, used for each main entry. All parameters are placed under it. In this case, these are safe to ignore.
### The section below has parameters which need our attention to be fixed with RegEx ###
xx={ 16835961 }
yyyy={ 16847062 67151971 74997 50388451 72836 83934207 50362874 16845543 81456 81771 67136455 33623075 16849442 100696613 82574 83286 83577 16852101 84199 33607712 }
zzz={ 79199 16848761 83893799 70029 76217 16854401 16839 16853836 50370644 145057 79338 81773 16849133 83891875 }
www={ 100693891 72513 16844226 33606062 16854968 16858108 33608429 16845608 67128408 33611952 50382602 67148972 67149505 50368894 78657 134238974 67119739 50362812 16833431 16852778 50353593 50378671 50383395 50386109 67120625 67126402 67136958 67145067 67145907 67151704 67158147 83897335 83898254 83921034 83921077 83927103 100681910 100691733 117474361 }
pppp={ 50350929 168.36935 33589252 }
rrrrr={ 322 482.865 }
### Other stuff in the file looks like this ###
info_about_this=blah
header=85095
Header=words_with_underlines
date=1938.08.22
that=2437
dummy=funny
}
</code></pre>
<p dir="auto"><em>Because of these irregularities, combined with certain similarities with other parameters, all RegEx should be ideally done in one go… unless there is a foolproof solution with multiple steps that will not alter other parameters.</em></p>
<p dir="auto"><strong>TO DO</strong><br />
These strings of numbers must be parsed so that I can further process them. The following list explains the end result I need.</p>
<ul>
<li><strong>Separate line</strong> for <em>each number</em></li>
<li><strong>Header captured</strong> to be <em>included before each number</em>, i.e. <strong>01234</strong> → <strong>header=01234</strong></li>
<li><strong>Any</strong> in-line <strong>(white)spaces</strong> should be <em>removed</em>, including the ones before &amp; after numbers and brackets</li>
<li>The <strong>curly brackets</strong> are <strong>redundant</strong> so, ideally, they <em>should be removed</em> - I only need the headers and numbers</li>
<li><em>Everything beyond these strings should be kept intact - any changes <strong>will</strong> cause errors!</em></li>
</ul>
<p dir="auto"><em>The final product of the above dummy should look like the one below. Please, ignore the lines with “etc” - there are so many values that it’s best to abbreviate.</em></p>
<pre><code>100={
xx=16835961
yyyy=16847062
yyyy=74997
yyyy=50388451
yyyy=728
yyyy=83934207
# etc...
zzz=
zzz=79199
zzz=16854401
zzz=16839
# etc...
pppp=
pppp=50350929
pppp=168.36935
pppp=33589252
# etc...
info_about_this=blah
header=85095
Header=words_with_underlines
date=1938.08.22
that=2437
dummy=funny
}
</code></pre>
<p dir="auto">Thank you for your time and patience! Any help would be incredibly helpful!<br />
Have a nice day!</p>
]]></description><link>https://community.notepad-plus-plus.org/post/67896</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/67896</guid><dc:creator><![CDATA[Grimaldas Grydas]]></dc:creator><pubDate>Wed, 14 Jul 2021 15:49:48 GMT</pubDate></item><item><title><![CDATA[Reply to RegEx: Split each number of a string inside curly brackets into a separate line, add a prefix to it &amp; remove all unnecessary data on Wed, 14 Jul 2021 15:25:36 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/3841">@PeterJones</a></p>
<p dir="auto">Thank you for the reply, Sir! I am sorry for forgetting about the markup, will fix it soon after this!</p>
<p dir="auto">However, I think you didn’t have to be rude-ish to me, especially because judging by your reply, it is merely because of overlooking one (though important) markup. This is the first time I am posting here, and this forum is new to me. Despite the effort I put into this, I couldn’t possibly remember everything at once, especially the specific markup preferred here and with my current exhaustion. This is why editing exist - so I can fix such blunders :)</p>
<blockquote>
<p dir="auto">You used a lot of markdown formatting in your post… but where it really mattered (using the <code>&lt;/&gt;</code> toolbar button or the raw ``` markdown code to indicate the start and end of your example data), you didn’t use markdown, so we cannot be sure of the data you actually have.  That makes it really hard to help you.</p>
</blockquote>
<p dir="auto">I’m sorry for missing / forgetting about the markup - thank you for noting! It would indeed stand out better with that. I will fix it right away!<br />
Please, let me know if there will still be something weird in the layout!</p>
<blockquote>
<p dir="auto">As much as I like Notepad++, if this is database data, wouldn’t it be best to fix the data using the database routines which are optimized for editing the database data?  Or fixing the report generator template to make the text output report the way you want it?</p>
</blockquote>
<p dir="auto">I am well aware of that route. However, due to certain reasons those cannot be altered, edited or otherwise in any way. It is that way for its own specific purposes, while my part is to convert all that data into a human-readable database. Technically, one could create a program in python or similar to do the parsing. However, I am not a programmer, so I don’t know how to do that… and I think it probably still requires RegEx.</p>
<p dir="auto">Also, so far, Notepad++ and its powerful RegEx has served the purpose perfectly well.</p>
<p dir="auto">Also, among dozens of RegEx edits on these files, as far as I know, this is the only step that I couldn’t overcome with RegEx on my own. Therefore, I thought it would be a waste not to try to solve it this way, or at least find out if it would be possible in the first place. Perhaps, I may have simply overlooked something simple that a RegEx guru would do right away? I don’t know. That’s why I thought I should ask here, where are many who know RegEx much better than I do.</p>
<blockquote>
<p dir="auto">Anyway, until you can mark your data better, and follow the advice that I will put below the horizontal line, I wouldn’t want to even take a stab at an answer yet.</p>
</blockquote>
<p dir="auto">…</p>
]]></description><link>https://community.notepad-plus-plus.org/post/67894</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/67894</guid><dc:creator><![CDATA[Grimaldas Grydas]]></dc:creator><pubDate>Wed, 14 Jul 2021 15:25:36 GMT</pubDate></item><item><title><![CDATA[Reply to RegEx: Split each number of a string inside curly brackets into a separate line, add a prefix to it &amp; remove all unnecessary data on Wed, 14 Jul 2021 14:03:48 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/22541">@Grimaldas-Grydas</a> ,</p>
<blockquote>
<p dir="auto">I am really tired at the moment</p>
</blockquote>
<p dir="auto">You may want to come back and clarify things when you’ve had some rest, then.</p>
<p dir="auto">You used a lot of markdown formatting in your post… but where it really mattered (using the <code>&lt;/&gt;</code> toolbar button or the raw ``` markdown code to indicate the start and end of your example data), you didn’t use markdown, so we cannot be sure of the data you actually have.  That makes it really hard to help you.</p>
<blockquote>
<p dir="auto">I have a huge database of dozens of files, each containing typically millions of lines,</p>
</blockquote>
<p dir="auto">As much as I like Notepad++, if this is database data, wouldn’t it be best to fix the data using the database routines which are optimized for editing the database data?  Or fixing the report generator template to make the text output report the way you want it?</p>
<p dir="auto">Anyway, until you can mark your data better, and follow the advice that I will put below the horizontal line, I wouldn’t want to even take a stab at an answer yet.</p>
<p dir="auto">----</p>
<p dir="auto"><em>Do you want regex search/replace help?  Then please be patient and polite, show some effort, and be willing to learn; answer questions and requests for clarification that are made of you.  All example text should be marked as literal text using the <code>&lt;/&gt;</code> toolbar button or manual <a href="https://community.notepad-plus-plus.org/topic/14262/how-to-markdown-code-on-this-forum/4">Markdown syntax</a>. To make <code>regex in red</code> (and so they keep their special characters like *), use backticks, like <code>`^.*?blah.*?\z`</code>. Screenshots can be pasted from the clipboard to your post using <code>Ctrl+V</code> to show graphical items, but any text should be included as literal text in your post so we can easily copy/paste your data. Show the data you have <strong>and</strong> the text you want to get from that data; include examples of things that <strong>should match</strong> and be transformed, <strong>and</strong> things that <strong>don’t match</strong> and should be left alone; show <strong>edge cases</strong> and make sure you examples are as <strong>varied</strong> as your real data.  Show the regex you already tried, <strong>and why</strong> you thought it should work; tell us what’s wrong with what you <strong>do</strong> get. Read the official <a href="https://npp-user-manual.org/docs/searching/#regular-expressions" rel="nofollow ugc">NPP Searching / Regex docs</a> and the forum’s <a href="https://community.notepad-plus-plus.org/topic/15765/faq-desk-where-to-find-regex-documentation">Regular Expression FAQ</a>. If you follow these guidelines, you’re much more likely to get helpful replies that solve your problem in the shortest number of tries.</em></p>
]]></description><link>https://community.notepad-plus-plus.org/post/67889</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/67889</guid><dc:creator><![CDATA[PeterJones]]></dc:creator><pubDate>Wed, 14 Jul 2021 14:03:48 GMT</pubDate></item></channel></rss>