Delete number strings in the middle of lines of data



  • line of data:
    3.2215E+02 3.2215E+02 3.2215E+02 3.2215E+02 3.2215E+02 3.2215E+02 3.2215E+02 3.2215E+02 3.2215E+02 1.00000E+00 5.55663E-01

    2000-1000 lines of data, I only need the first and last two numbers of each line, Notepad++v7.8.5; found a question & response that seemed to fit for a smaller line of data:

    Find: ^(.{12}).{107}
    Replace: (\1)0000000000

    It seemed straightforward, but can’t seem to adapt it to mine. (BTW, I use a couple of regexp but with little understanding)

    thanks
    pmw7070



  • @Paul-Whaley said in Delete number strings in the middle of lines of data:

    I only need the first and last two numbers of each line,

    Try this:

    Open the Replace dialog by pressing Ctrl+h and then set up the following search parameters:

    Find what box: (?-s)^(\S+).*?(\S+)$
    Replace with box: \1 \2
    Search mode radiobutton: Regular expression
    Wrap around checkbox: ticked
    . matches newline checkbox: doesn’t matter (because the (?-s) leading off the Find what box contains an s variant)

    Then press the Replace All button.



  • Hi @Alan-Kilborn

    I’m afraid your regex missed one number, as OP needs last two numbers.

    PS: At first sight, I misread it too.



  • @Paul-Whaley said in Delete number strings in the middle of lines of data:

    Find: ^(.{12}).{107}
    Replace: (\1)0000000000
    It seemed straightforward, but can’t seem to adapt it to mine.

    The regex you supplied does the following:
    ^ find the start of a line
    (.{12}) capture 12 characters of any sort, ( and ) allows them to be returned, see \1 in the replace field. The . generally means any character and can also include the end of line if the . matches newline box is ticked.
    .{107} means capture 107 characters, however without ( and ) around it you cannot return these in the replace field. It’s only possible use seems to be to set the pointer to another forward position, possibly the start of the next line so the regex can perform the same function again (using replace all button).
    The replace field returns capture group 1 within brackets followed by 10 zeros.

    This regex will require a regimented set of data, that is, the data must be of same length in each line as it will not cater for any differences. Rather it blindly captures the requisite number of characters.

    If your data is also regimented (fixed length) you can do similar, however I won’t supply such a regex. I would say that this type of regex is very dangerous (on the wrong data) as it provides no exceptions or “quality” control.

    @Alan-Kilborn regex provides such quality control. He did make a mistake, in not getting the last 2 fields, I’ll let him provide an updated answer. I was just replying to alert you to the danger of taking another regex (possibly out of context) and without the knowledge to interpret it and hope you can edit it to suit your needs.

    Terry



  • @Paul-Whaley said::

    first and last two numbers of each line

    I took this to mean “first and last…two numbers on each line”
    I guess it is a language issue. I’m English-speaking all the way, but not for 100% of my life, maybe 85%?
    :-)

    The simplest change to what I provided earlier is:

    Find what: (?-s)^(\S+).*?(\S+) (\S+)$
    Replace with: \1 \2 \3

    But in a nod to @Terry-R 's cautionary statements, perhaps this is better:

    Find what: (?-s)^([0-9.E+-]+).*?([0-9.E+-]+) ([0-9.E+-]+)$
    Replace with: same

    This gives an opportunity to show a form that avoids duplication and the possibility of copy-and-paste errors. This also works:

    Find what: (?-s)^([0-9.E+-]+).*?((?1)) ((?1))$
    Replace with: same

    Note that instead of retyping [0-9.E+-]+, since that is the first capture group, I just referenced it as (?1) when I wanted to use it (two places) later in the expression.

    I saw this technique mentioned recently in the Notepad++ documentation issues list on github, but I can’t recall just where, to cite it now. :-(



  • @Paul-Whaley said first,

    I only need the first and last two numbers of each line

    @Alan-Kilborn said,

    I took this to mean “first and last…two numbers on each line”

    So, @Alan-Kilborn interpreted it as “first and last, for a total of two”; @astrosofista interpreted it as “first (one) and last two, for a total of three”; I interpreted it as “first (two) and last two, for a total of four”.

    This is why giving before and after data is so critical when requesting help with any search-and-replace question, no matter how clear your statement is in your own head.

    I guess it is a language issue. I’m English-speaking all the way, but not for 100% of my life, maybe 85%?

    Hmm… I wouldn’t have guessed that.

    I saw this technique mentioned recently in the Notepad++ documentation issues list on github, but I can’t recall just where, to cite it now. :-(

    It’s because it was in comments on the PR#79, not on an issue.

    [0-9.E+-]+

    The problem with this subexpression is that it would match EEEEEEEEEEEEEEEEEEEEE or E+E-E+E-

    Assuming that no number starts with just a decimal point (so 0.5 or 0.5E5 are okay, but .5 or .5E5 are not), then \b([0-9]+\.?[0-9]*(?:E[+-]?\d+)?)\b will work for each “number”, with the full expression of

    • FIND = ^(\b\d+\.?\d*(?:E[+-]?\d+)?\b) ((?1)).*?((?1)) ((?1))$
    • REPLACE = $1 $2 $3 $4
    • MODE = regular expression

    will take the first two and last two from lines that have at least four valid numbers

    3.2215E+02 3.2215E+02 3.2215E+02 3.2215E+02 3.2215E+02 3.2215E+02 3.2215E+02 3.2215E+02 3.2215E+02 1.00000E+00 5.55663E-01
    # these next two lines won't match
    .1 .2 .3 .4 .5 .6
    .1E+15 .2E0 .3E-4 .4E-05 .5E5 .6E123
    # beyond here should match again
    0.1E+15 0.2E0 0.3E-4 0.4E-05 0.5E5 0.6E123
    1E23 4E+56 7E-89 1E234 100.000
    

    to become

    
    3.2215E+02 3.2215E+02 1.00000E+00 5.55663E-01
    # these next two lines won't match
    .1 .2 .3 .4 .5 .6
    .1E+15 .2E0 .3E-4 .4E-05 .5E5 .6E123
    # beyond here should match again
    0.1E+15 0.2E0 0.5E5 0.6E123
    1E23 4E+56 1E234 100.000
    

    Note that this example shows both things that should match and things that shouldn’t match. This is again good practice when asking for search/replace help.

    ----

    Do you want regex search/replace help? Then please be patient and polite, show some effort, and be willing to learn; answer questions and requests for clarification that are made of you. All example text should be marked as plain text using the </> toolbar button or manual Markdown syntax; screenshots can be pasted from the clipbpard to your post using Ctrl+V. Show the data you have and the text you want to get from that data; include examples of things that should match and be transformed, and things that don’t match and should be left alone; show edge cases and make sure you examples are as varied as your real data. Show the regex you already tried, and why you thought it should work; tell us what’s wrong with what you do get… Read the official NPP Searching / Regex docs and the forum’s Regular Expression FAQ. If you follow these guidelines, you’re much more likely to get helpful replies that solve your problem in the shortest number of tries.



  • @PeterJones said in Delete number strings in the middle of lines of data:

    [0-9.E±]+ The problem with this subexpression is that it would match EEEEEEEEEEEEEEEEEEEEE or E+E-E+E-

    Sure, but there are limits as to how far I’m willing to go, especially when it isn’t demonstrated (with good data sample by the OP) that it is needed. :-)

    My goal was just to give the nod to @Terry-R when he said regex is very dangerous (on the wrong data) as it provides no exceptions or “quality” control.

    I guess it is a language issue. I’m English-speaking all the way, but not for 100% of my life, maybe 85%?
    Hmm… I wouldn’t have guessed that.

    I did a better calculation, it’s actually 93% of my life.
    I double-checked with my mother, who at the moment is living with me. Sometimes this fact is :-) and sometimes it is :-(



  • @Paul-Whaley said in Delete number strings in the middle of lines of data:

    I only need the first and last two numbers of each line,

    Oh what a tangled web we weave. So 4 people got 3 different answers. Now I’m confused, all 3 have a glimmer of truth from the statement the OP made. I guess we need the OP to provide more info and especially the before/after “shot” of the data.

    Sorry @Alan-Kilborn I awkwardly referred to your solution, which I see you thought I meant wasn’t “up to grade”. Actually I was trying to applaud it as great use of the “negative” class of "whitespace. My beef was with the OP providing a regex from some unknown location and stating they had little knowledge of how to “edit” it to suit. I was providing a cautionary tale to them.

    The 1 example line, whilst seemingly regimented data was actually not “in my mind”. Most fields were 10 characters long, the last 2 were 11 characters long. Use of the provided regex (albeit edited) would have been an issue as possibly any of the fields may have changed between the 10 and 11 characters making the use of a regex without ability to detect these changes “very dangerous”.

    I understand where @PeterJones comes from with a more defined regex, however I think the one thing we CAN see in the example is the data ONLY contains scientific numbers in a “clearly” defined format. I was happy with @Alan-Kilborn use of the (negative) whitespace class to define the boundaries.

    Cheers
    my 2c worth
    Terry



  • @Terry-R said in Delete number strings in the middle of lines of data:

    which I see you thought I meant wasn’t “up to grade”

    Not at all !

    And the “cautionary” stuff is well-noted as well. We’re all in charge of our own data.
    And I think the OP was truly okay anyway unless he was going to do a backupless Replace in Files … there is always UNDO!

    And BTW, \S --that’s capital S-- is getting more workout in my own regex solutions these days!



  • Hello, @paul-whaley, @astrosofista, @alan-kilborn, @peterjones and All,

    Truly original, these 3 possible interpretations ;-)) Hence the necessity for OPs to always describe their needs, in a rigorous way !


    @peterjones, I think that your regex, to catch a number, could even be improved !

    First, I added the possibility to match a possible sign, in front of the number ( obvious )

    Secondly, if we test your regex \b\d+\.?\d*(?:E[+-]?\d+)?\b, against the malformed numbers, below, it wrongly matches something ! For instance, when it matches 03, in the third line the \b assertion which is a location between a non-word char and a word char suppose it’s OK as, the minus sign, before 03 is, indeed, a non word char !

    .5          Wrongly matches the string 5
    
    .5E+02      Wrongly matches the string 5E+02
    
    .E-03       Wrongly matches the string 03
    
    .E+07       Wrongly matches the string 07
    
    23.         Wrongly matches the string 23
    

    So, here is my solution, which do not use at all the \b assertion :

    (?<![.\w+-])[+-]?\d+(?:\.\d+)?(?:E[+-]?\d+)?(?![.\w]), meaning that any number :

    • Cannot be preceded by, either, a word character, the decimal dot ., the + sign or the - sign

    • Cannot be followed by, either, a word character or the decimal dot .


    Now, I’m going to tell you about a little-known structure : a conditional regex syntax, (?(Condition)........), where the condition is the reserved word DEFINE, in upper case. The condition DEFINE is always FALSE so the contents of this conditional structure will never be part of the regex.

    It seems pointless ! Make no mistake : it allows you to define a kind of library of regular expressions, which you can use, at your leisure, to build your effective regular expression ;-))

    For instance, let’s suppose we want to match a IPV4 address : Each of the 4 parts are a byte, with value from 0 to 255. To get this integer, we must build the regex 25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d

    Now, the general form of a IPV4 address is \bByte(.Byte){3}\b. Thus, in order to match an IPV4 address, we may use the conditional DEFINE syntax, below :

    (?(DEFINE)(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d))\b(?1)(\.(?1)){3}\b


    You could say: why don’t you prefer the simple regex, below ?

    \b(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(?1)){3}\b

    Well, let’s suppose that there are 3 different ways to match an IPV4 address :

    Then, your regex would have been changed into :

    \b((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(?1)){3}|...2nd alternative...|...3rd alternative...)\b

    But, when the 2nd or 3rd alternative matches, the group1 is not defined any more and you cannot use it in the other alternatives ! With the conditional DEFINE syntax, there is still possible :

    (?(DEFINE)(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d))\b((?1)(\.(?1)){3}|...2nd alternative...|...3rd alternative)\b


    BTW, even using the simple syntax, it’s important to see the difference between these two regexes :

    • \b(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.\1){3}\b

    • \b(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(?1)){3}\b

    Well :

    • The former regex would only match IPV4 addresses like 45.45.45.45, 0.0.0.0, 127.127.127.127, as the back-reference \1 refers to the value of group 1

    • Whereas the later would find any valid IPV4 address, because the (?1) syntax are similar to a subroutine, in a programming language, and refers to regex itself 25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d


    Let’s go back to the DEFINE condition. The nice thing is that it allows you to define more than one regex/group :

    For instance, let’s imagine that you analyze a DNA genetic sequence and that you want to highlight certain combinations of the 3 elementary parts, below :

    • ATT.{12,}?ATT ( Zone A )

    • TAG.{9}TAG ( Zone B )

    • CCC.*?CCC ( Zone C )

    Now, from these elementary parts, let’s assume that the combinations we’re looking for, in this DNA sequence, are three :

    • ACCCTTTB

    • CAC

    • BGATCGATA

    Then, you could build the regex :

    (?(DEFINE)(ATT.{12,}?ATT)(TAG.{9}TAG)(CCC.*?CCC))(?1)CCCTTT(?2)|(?3)(?1)(?3)|(?2)GAT(?3)GAT(?1)

    or, with the Free-spacing mode :

    (?x) (?(DEFINE)   (ATT.{12,}?ATT)   (TAG.{9}TAG)   (CCC.*?CCC))   (?1)CCCTTT(?2)   |   (?3)(?1)(?3)   |   (?2)GAT(?3)GAT(?1)

    Any alternative, in the right part of the regex, is functional, because the elementary parts A, B and C are defined, once and for all, due to the conditional DEFINE structure in the left part of the regex

    Test these two regex syntaxes against the DNA sequence, below :

    GCTAATTCGGCTGATATCGATTCCCTTTTAGGGACTTACGTAGCCATGGATCCCATGCATGCCCATTCGGCTGATATCGATTCCCTGCCCGGATTTCTAGGGACTTACGTAGGATCCCATGCATGCCCGATATTCGGCTGATATCGATTTAAGGGCTA
    

    To end, let’s apply all these notions to the OP’s problem :

    Let’s assume this single line of numbers :

    +1.1111E-01 -2.2222E+02 +3.3333E-03 -4.4444E+04 5.5555E-05 +6.6666E+06 -7.7777E-07 8.8888E+08 -9.9999E-09
    

    If we use the conditional DEFINE structure and the free-spacing mode (?x), you can write, these 3 following regexes :

    SEARCH :
    
    Regex A  :  (?x-si) (?(DEFINE) ( (?<![.\w+-]) [+-]?\d+(?:\.\d+)?(?:E[+-]?\d+)? (?![.\w]) ) ) .*? ( ^ (?1)\h       | (?1)       $ )  #  @Alan         flavor
    Regex B  :  (?x-si) (?(DEFINE) ( (?<![.\w+-]) [+-]?\d+(?:\.\d+)?(?:E[+-]?\d+)? (?![.\w]) ) ) .*? ( ^ (?1)\h       | (?1)\h(?1) $ )  #  @Astrosofista flavor
    Regex C  :  (?x-si) (?(DEFINE) ( (?<![.\w+-]) [+-]?\d+(?:\.\d+)?(?:E[+-]?\d+)? (?![.\w]) ) ) .*? ( ^ (?1)\h(?1)\h | (?1)\h(?1) $ )  #  @PeterJones   flavor
    
    REPLACE  :  \2
    

    And we get the 3 results below :

    +1.1111E-01 -9.9999E-09                            @Alan         flavor , with Regex A
    +1.1111E-01 8.8888E+08 -9.9999E-09                 @Astrosofista flavor , with Regex B
    +1.1111E-01 -2.2222E+02 8.8888E+08 -9.9999E-09     @PeterJones   flavor , with Regex C
    

    Best Regards,

    guy038



  • @guy038 said in Delete number strings in the middle of lines of data:

    (?(DEFINE)…)

    It’s a nice construct. It is documented here for those that don’t know:

    https://www.boost.org/doc/libs/1_70_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

    as

    (?(DEFINE)never-exectuted-pattern) Defines a block of code that is never executed and matches no characters: this is usually used to define one or more named sub-expressions which are referred to from elsewhere in the pattern.
    

    I don’t think it has a mention in the official Notepad++ docs, though.

    It doesn’t mean a lot if you simply read it, but a lot of value is added with a concrete example such as that provided by @guy038

    One thing that I don’t like about it is that it consumes a capture group number. Wouldn’t it be better to work with named and not numbered groups? Indeed the docs say “…define one or more named sub-expressions…” so this would be equivalent for “my” regex (regex A) above:

    (?x-si)    (    ?(DEFINE)    (?<ALAN>    (?<![.\w+-]) [+-]?\d+(?:\.\d+)?(?:E[+-]?\d+)? (?![.\w])    )    )        (^(?P>ALAN)\h|(?P>ALAN)$)
    

    But alas, even though I’ve used a group named ALAN above, it is equivalent to group #1, thus a possible equivalency use case could look like this:

    (?x-si)    (    ?(DEFINE)    (?<ALAN>    (?<![.\w+-]) [+-]?\d+(?:\.\d+)?(?:E[+-]?\d+)? (?![.\w])    )    )        (^(?1)\h|(?1)$)
    

    Note that the difference is, even though I’ve named the group ALAN at “define” time, I refer to it as 1 when actually used.

    So why is this a downside? Well, because it couples the left side (definition) with the right side (use). Maybe I have a library of definitions, that I want to largely ignore (except their names), and I’m wanting to write a regex I’m going to use to match some data–maybe in the regex I want to backrefer to my own capture group #1. Well, because of the coupling, group #1 would already be in use.

    Ok, so maybe it is a slight downside that wouldn’t come up often, but, I just happened to encounter that scenario recently… :-)

    Did this turn into a Boost regex forum accidentally, or what?!? So sorry…



  • Hi, @alan-kilborn and All,

    Yes, Alan, I’m agree with you that named groups should not be numbered by the regex engine and, thus, the user should only use them, in backreferences, with their names !

    However, the .NET regex engine, has an intelligent way to have the best of both worlds ! Indeed, the regex engine scans all unnamed groups, first, numbering them from value 1, then re-scans the regex for all named groups, continuing to number, from after the greatest number used in unnamed groups ;-))

    In the old version, below, of the Regular-Expressions manual, of Jan Goyvaerts ( creator of the Regular-expressions.info site ),

    https://www.princeton.edu/~mlovett/reference/Regular-Expressions.pdf

    it is said, pages 36-37

    Names and Numbers for Capturing Groups :

    Here is where things get a bit ugly. Python and PCRE treat named capturing groups just like unnamed capturing groups, and number both kinds from left to right, starting with one. The regex (a)(?P<x>b)(c)(?P<y>d) matches abcd as expected. If you do a search-and-replace with this regex and the replacement \1\2\3\4, you will get abcd. All four groups were numbered from left to right, from one till four. Easy and logical.

    Things are quite a bit more complicated with the .NET framework. The regex (a)(?<x>b)(c)(?<y>d) again matches abcd. However, if you do a search-and-replace with $1$2$3$4 as the replacement, you will get acbd. Probably not what you expected.

    The .NET framework does number named capturing groups from left to right, but numbers them after all the unnamed groups have been numbered. So the unnamed groups (a) and (c) get numbered first, from left to right, starting at one. Then the named groups (?<x>b) and (?<y>d) get their numbers, continuing from the unnamed groups, in this case: three.

    To make things simple, when using .NET’s regex support, just assume that named groups do not get numbered at all, and reference them by name exclusively.

    But, with the Boost regex engine of Notepad++, we have to make do with the usual numbering of the groups, which just does one regex scan and numbers any group, named or not, one after the other !

    Best Regards,

    guy038



  • @guy038

    Maybe getting really off-topic now, but with the “DEFINE” stuff it got me thinking about a similar “problem” I have. I say “problem” because it is nothing I can’t workaround, but I’m wondering if there is a better solution.

    Consider:

    search: (?-i)(Xxx)|(XXX)|(Yyy)
    replace: (?1Zzz)(?2ZZZ)(?3Www)

    This would convert this text: The quick Xxx Yyy jumped over the lazy XXX into The quick Zzz Www jumped over the lazy ZZZ

    So please don’t consider the wrong problem. What I have is a simplified example of something more complicated, and the above is just for illustration.

    What I’d like to do is to NOT have to specify the capitalized version of ZZZ in the replace, but rather use the Zzz text without respecifying it (important!) in combination with a \U option.

    So in pseudo-regex, because I know this won’t work, without even trying it:

    replace: (?1Zzz)(?2\U${1}\E)(?3Www)

    So I was just wondering if you had any thoughts on this. TIA. :-)



  • Hi, @alan-kilborn,

    Your replacement cannot work because, when the search regex matches the string XXX, due to the different alternatives, the group 2 is the only group defined, anyway :-((

    In addition, seemingly, you’re not interested by the group 1, itself, but only with the replacement string of this group , so that you would like something like (?2\UREPLACEMENT of (\1)\E) !!


    Let’s imagine the text sample, below, which is used in all subsequent tests :

    Xxx
    XXX
    XXX---Xxx
    

    Then with the regex S/R :

    SEARCH    (?x-i)   ^(Xxx)$ | ^(XXX)$ | (\2---\1)
    Groups :            1         2        3
    
    REPLACE   \r\nGroup 1 >\1<\r\nGroup 2 >\2<\r\nGroup 3 >\3<\r\n
    

    We get :

    Group 1 >Xxx<
    Group 2 ><
    Group 3 ><
    
    
    Group 1 ><
    Group 2 >XXX<
    Group 3 ><
    
    XXX---Xxx
    

    As explained above, the search regex does match the Xxx and XXX strings but fails to find the XXX---xxx because when trying the 3rd alternative, the groups _1 and \2 are not defined


    OK, let’s try another syntax, using sub-routine calls (?#) :

    SEARCH    (?x-i)   ^(Xxx)$ | ^(XXX)$ | ((?2)---(?1))
    Groups :            1         2        3
    
    REPLACE   \r\nGroup 1 >\1<\r\nGroup 2 >\2<\r\nGroup 3 >\3<\r\n
    

    Text turns into :

    Group 1 >Xxx<
    Group 2 ><
    Group 3 ><
    
    
    Group 1 ><
    Group 2 >XXX<
    Group 3 ><
    
    
    Group 1 ><
    Group 2 ><
    Group 3 >XXX---Xxx<
    

    This time, the result is better as, when matching the string XXX---xxx, with the alternative ((?2)---(?1)), it makes reference to groups 1 and 2, outside the alternative matched, due to the (DEFINE) syntax !

    However, we don’t get the groups 1 and 2, individually


    Let’s use, again, an other syntax, where any sub-routine call (?#) is embedded in parentheses, itself, so ((?#))

    SEARCH    (?x-i)   ^(Xxx)$ | ^(XXX)$ | ((?2))---((?1))
    Groups :            1         2        3        4 
    
    REPLACE   \r\nGroup 1 >\1<\r\nGroup 2 >\2<\r\nGroup 3 >\3<\r\nGroup 4 >\4<\r\n
    

    Just note that the 3rd alternative is not embedded, itself, between parentheses. After execution, we’re left with :

    Group 1 >Xxx<
    Group 2 ><
    Group 3 ><
    Group 4 ><
    
    
    Group 1 ><
    Group 2 >XXX<
    Group 3 ><
    Group 4 ><
    
    
    Group 1 ><
    Group 2 ><
    Group 3 >XXX<
    Group 4 >Xxx<
    

    Ah!.. ,now, when the regex engine tries the 3rd alternative, it does match the string XXX-Xxx and, in replacement, we note that groups 3 and 4 ( which are identical to groups 2 and 1, respectively, not part of the present match ), are both defined :-))

    So, using a more natural example, below :

    SEARCH    (?x-i)   ^(Xxx)$ | ^(XXX)$ | ((?2))---((?1))
    Groups :            1         2        3        4 
    
    REPLACE   (?1ABC)(?2DEF)(?3Group 1 = \4 and Group 2 = \3)
    

    The sample text :

    Xxx
    XXX
    XXX---Xxx
    

    is changed into :

    ABC
    DEF
    Group 1 = Xxx and Group 2 = XXX
    

    However, there’s still a problem, as, in your example, you would like to refer to the replacement part of a group, which does not participate to the overall match, anyway ! More complicated

    We must find a way :

    • To match and capture the string XXX

    • To capture the string ZZZ, in the same alternative, although the string ZZZ would not be part of the overall match

    Still searching !

    Best Regards,

    guy038



  • @guy038 I was working on a paper wen i notice i had to replace averything after a (space)
    exemple 2020-04-10 21,25,25

    I found the pdf pud i’m just Dum
    how to remove every regular expresion: 21,25,25
    so everything after the year-month-day?
    And sorry If I did broke few rules éditor Notepad++



  • @cracksoft said in Delete number strings in the middle of lines of data:

    @guy038 I was working on a *papier wen i notice i had to replace averything after a (space)
    exemple 2020-04-10 21,25,25

    I found the pdf pud i’m just Dum
    how to remove every regular expresion: 21,25,25
    so everything after the year-month-day?
    And sorry If I did broke few rules éditor Notepad++
    *edit



  • @cracksoft **edit I may be on the right track I just found front the pdf you provide in this post space = \s if i’m not wrong?



  • Hello, @craksoft, and All,

    If I fully understood your needs, you would like to delete the part after a date, which, I suppose, is the hour part ?

    If so :

    • SEARCH (?-s)(?<=\d{4}-\d\d-\d\d)\h+.{8}

    • REPLACE Leave EMPTY

    • Select the Regular expression search mode

    Best Regards,

    guy038

    P.S. :

    For regex documentation, follow this link :

    https://community.notepad-plus-plus.org/topic/15765/faq-desk-where-to-find-regex-documentation



  • @guy038 said in Delete number strings in the middle of lines of data:

    (?-s)(?<=\d{4}-\d\d-\d\d)\h+.{8}

    So this long thing (?-s)(?<=\d{4}-\d\d-\d\d)\h+.{8} is 6 number ?
    Still thank it work you made my escape of selecting and deleting few hours of work ^^



  • Hi, @craksoft, and All,

    You said :

    So this long thing (?-s)(?<=\d{4}-\d\d-\d\d)\h+.{8} is 6 number ?

    I don’t know what you means, exactly !?

    The regex expression (?-s)(?<=\d{4}-\d\d-\d\d)\h+.{8} deletes blanks characters and the next 8 characters, when preceded by a date, with the YYYY-MM-DD format. No more, no less :-)

    BR

    guy038


Log in to reply