Delete most of the text of the file



  • Hello,
    I have a .txt file that is like that:
    1 Elevator de chantier 1L01522838425 110,30 € 110,30 €
    1 Oreillet bleu 1L01522839954 51.26 € 51.26 €
    1 Housse de couette Bibi 1L01522838521 86.20 € 86.20 €

    I need to only keep the numbers : 110,30 & 51,26 and 86,20.
    Do you know how can I do that please ?

    Thank you :)



  • @guy038
    will probably provide a more elegant solution but the following regular expression Find & replace worked for me on your example:

    Find: ^.*[^\d](\d+[\.,]\d+)[^\d].*$
    Replace: \1

    I wasn’t sure if you use . or , as separator.
    It assumes you want to keep a single xx.yy number in each line.



  • Hello Vincent, Gstavi and All

    Vincent, assuming that the price always ends each line, in your file, an other syntax of that S/R, could be :

    SEARCH : (?-s)^.+ (.+) € with a space, before the opening parenthesis an a second space character, after the closing one !

    REPLACE : \1

    As gstavi, I, also, suppose that you just want one copy of each price !


    Let’s see, what happens, with the first example line

    1 Elevator de chantier 1L01522838425 110,30 € 110,30 €
    

    Notes :

    • The modifier (?x-s) :

      • Begins an extended area, which allows the user to split a regex, in several lines, and to add some comments

      • Ensures that the dot, ., special character matches a single standard character only and never an EOL one, even if you, previously, checked, by mistake, the . matches newline option !

    • Then, from beginning of line, ^, the part .+, with a space after the plus sign, matches the longest, non empty, range of standard characters, till a space character => It first matches the string "1 Elevator de chantier 1L01522838425 110,30 € 110,30 "

    • However, as the regex engine must, also, matches, afterwards, the same range of characters, till a space, followed by the Euro sign, it needs to backtrack till the space character, which is located before the first number 110,30

    • Now, the part (.+) does match the longest, non empty, range of standard characters, till the string " €" ( a space and the euro sign ). As it’s surrounded by round brackets, the final number, 110,30, of each line, is stored as group 1

    • Finally the ending part just match the last two characters ( space + Euro ), literally

    • In Replacement, the complete line, without its EOL character(s), is replaced by the final number, only


    BTW, you may rewrite this regex, using the PRCE EXTENDED mode, as below :

     (?x-s)  #  Enable the EXTENDED mode and disable the DOTALL behaviour
     ^.+[ ]  #  The LONGEST, NON empty, range, of STANDARD characters, ending with a SPACE
     (.+)    #  The number to keep, with digits 0-9, the comma and dot separator 
             #      = the LONGEST, NON empty, range of STANDARD characters, till the string ' €'
     [ ]€    #  A SPACE, followed by a single **EURO** character
    

    In this mode, any vertical or horizontal blank character, found in the regex, is NOT taken in account ! So, some characters will need to be escaped, with an antislash character. For the space character, you can use the better syntax [ ] !

    Now, here’s the magic :

    • Select these five lines, above, that describe the search regex

    • Open the Replace dialog, by hitting on the CTRL + H shortcut

    • Set the Regular expression search mode

    • Type the regex \1( or $1 ), in the Replace with field

    • Click, once, the Replace All Button, or several times on the Replace button

    Et voilà !


    BTW, for American/English people, a small hint about numbers syntax :

    The correct English syntax, 12,345.78, for instance, would be replaced, in French, by the syntax 12 345,78 Just note that some other European countries may use a dot, instead of the French space character, to separate each group of thousand’s !

    Best Regards,

    guy038



  • Hi,

    thank you a lot ! It works very nice :) !
    Sometimes there is no price at the end of the line. Just a number like this : 1L01522838521 (it could be others numbers, it’s just an example).
    The line stay the same. Could it be possible to delete these all lines ?
    Thanks again for your help :)



  • Hi, Vincent,

    Do you mean that your file may contain lines, as below, that you would like to delete ?

    1 Elevator de chantier 1L01522838425
    1 Housse de couette Bibi 1L01522838521
    

    If so, a possible regex syntax, for that S/R, would be :

    SEARCH (?-s)^.+ (.+) €.*|^(?!.+€).*\R , with a space, before the first opening parenthesis an a second space character, after the first closing one !

    REPLACE ?1\1

    Notes :

    • From beginning of the line, ^ :

      • The first alternative is almost identical to the search regex of my previous post. I just added the syntax .* at the end of the first branch, in order to match all subsequent characters, after the last character

      • The first part of the second branch of the alternative, (?!.+€) is a condition, called a negative look-behind, which means “NO Euro sign, exists, further on, in the current line ?”

      • If this condition is TRUE, then, the regex engine matches the second, and main, part, of the second branch, .*\R. In other words, all the contents, even empty, of the current line, with its EOL characters

    • In replacement, IF group 1 ( the price ) exists, we just rewrite that group 1 ( the value), ELSE we do not rewrite anything => All the current line, with the EOL character(s), included, is deleted

    So, from the original text, below :

    A Elevator de chantier 1L01522838425 110,30 € 110,30 €
    
    C Elevator de chantier 1L01522838425
    D Oreillet bleu 1L01522839954 51.26 € 51.26 €
    
    F Housse de couette Bibi 1L01522838521
    G Housse de couette Bibi 1L01522838521 86.20 € 86.20 €
    
    I Elevator de chantier 1L01522838425 27,10 €
    J Oreillet bleu 1L01522839954 734.56 €
    K Housse de couette Bibi 1L01522838521 0,99 €
    
    M Elevator de chantier 1L01522838425 1.00 € Test
    N Oreillet bleu 1L01522839954 99,99 € small test
    O Housse de couette Bibi 1L01522838521 57.34 € A Test
    

    The regex S/R, above, would get the final text, below :

    110,30
    51.26
    86.20
    27,10
    734.56
    0,99
    1.00
    99,99
    57.34
    

    REMARKS :

    • If the price is present, only once, it doesn’t matter : The price will be displayed

    • If some text is present, after the last Euro character , it will be deleted, too


    If you prefer to keep the pure blank lines, use, instead :

    SEARCH (?-s)^.+ (.+) €.*|^(?!.+€).+\R

    REPLACE ?1\1

    And you’ll obtain the changed text, below :

    110,30
    
    51.26
    
    86.20
    
    27,10
    734.56
    0,99
    
    1.00
    99,99
    57.34
    

    Cheers,

    guy038


Log in to reply