Delete most of the text of the file

Vincent Tink

Hello,
I have a .txt file that is like that:
1 Elevator de chantier 1L01522838425 110,30 € 110,30 €
1 Oreillet bleu 1L01522839954 51.26 € 51.26 €
1 Housse de couette Bibi 1L01522838521 86.20 € 86.20 €

I need to only keep the numbers : 110,30 & 51,26 and 86,20.
Do you know how can I do that please ?

Thank you :)

gstavi

@guy038
will probably provide a more elegant solution but the following regular expression Find & replace worked for me on your example:

Find: ^.*[^\d](\d+[\.,]\d+)[^\d].*$
Replace: \1

I wasn’t sure if you use . or , as separator.
It assumes you want to keep a single xx.yy number in each line.

guy038

Hello Vincent, Gstavi and All

Vincent, assuming that the price always ends each line, in your file, an other syntax of that S/R, could be :

SEARCH : (?-s)^.+ (.+) € with a space, before the opening parenthesis an a second space character, after the closing one !

REPLACE : \1

As gstavi, I, also, suppose that you just want one copy of each price !

Let’s see, what happens, with the first example line

1 Elevator de chantier 1L01522838425 110,30 € 110,30 €

Notes :

The modifier (?x-s) :
- Begins an extended area, which allows the user to split a regex, in several lines, and to add some comments
- Ensures that the dot, ., special character matches a single standard character only and never an EOL one, even if you, previously, checked, by mistake, the . matches newline option !
Then, from beginning of line, ^, the part .+ , with a space after the plus sign, matches the longest, non empty, range of standard characters, till a space character => It first matches the string "1 Elevator de chantier 1L01522838425 110,30 € 110,30 "
However, as the regex engine must, also, matches, afterwards, the same range of characters, till a space, followed by the Euro sign, it needs to backtrack till the space character, which is located before the first number 110,30
Now, the part (.+) does match the longest, non empty, range of standard characters, till the string " €" ( a space and the euro sign ). As it’s surrounded by round brackets, the final number, 110,30, of each line, is stored as group 1
Finally the ending part € just match the last two characters ( space + Euro ), literally
In Replacement, the complete line, without its EOL character(s), is replaced by the final number, only

BTW, you may rewrite this regex, using the PRCE EXTENDED mode, as below :

 (?x-s)  #  Enable the EXTENDED mode and disable the DOTALL behaviour
 ^.+[ ]  #  The LONGEST, NON empty, range, of STANDARD characters, ending with a SPACE
 (.+)    #  The number to keep, with digits 0-9, the comma and dot separator 
         #      = the LONGEST, NON empty, range of STANDARD characters, till the string ' €'
 [ ]€    #  A SPACE, followed by a single **EURO** character

In this mode, any vertical or horizontal blank character, found in the regex, is NOT taken in account ! So, some characters will need to be escaped, with an antislash character. For the space character, you can use the better syntax [ ] !

Now, here’s the magic :

Select these five lines, above, that describe the search regex
Open the Replace dialog, by hitting on the CTRL + H shortcut
Set the Regular expression search mode
Type the regex \1( or $1 ), in the Replace with field
Click, once, the Replace All Button, or several times on the Replace button

Et voilà !

BTW, for American/English people, a small hint about numbers syntax :

The correct English syntax, 12,345.78, for instance, would be replaced, in French, by the syntax 12 345,78 Just note that some other European countries may use a dot, instead of the French space character, to separate each group of thousand’s !

Best Regards,

guy038

Vincent Tink

Hi,

thank you a lot ! It works very nice :) !
Sometimes there is no price at the end of the line. Just a number like this : 1L01522838521 (it could be others numbers, it’s just an example).
The line stay the same. Could it be possible to delete these all lines ?
Thanks again for your help :)

guy038

Hi, Vincent,

Do you mean that your file may contain lines, as below, that you would like to delete ?

1 Elevator de chantier 1L01522838425
1 Housse de couette Bibi 1L01522838521

If so, a possible regex syntax, for that S/R, would be :

SEARCH (?-s)^.+ (.+) €.*|^(?!.+€).*\R , with a space, before the first opening parenthesis an a second space character, after the first closing one !

REPLACE ?1\1

Notes :

From beginning of the line, ^ :
- The first alternative is almost identical to the search regex of my previous post. I just added the syntax .* at the end of the first branch, in order to match all subsequent characters, after the last € character
- The first part of the second branch of the alternative, (?!.+€) is a condition, called a negative look-behind, which means “NO Euro sign, exists, further on, in the current line ?”
- If this condition is TRUE, then, the regex engine matches the second, and main, part, of the second branch, .*\R. In other words, all the contents, even empty, of the current line, with its EOL characters
In replacement, IF group 1 ( the price ) exists, we just rewrite that group 1 ( the value), ELSE we do not rewrite anything => All the current line, with the EOL character(s), included, is deleted

So, from the original text, below :

A Elevator de chantier 1L01522838425 110,30 € 110,30 €

C Elevator de chantier 1L01522838425
D Oreillet bleu 1L01522839954 51.26 € 51.26 €

F Housse de couette Bibi 1L01522838521
G Housse de couette Bibi 1L01522838521 86.20 € 86.20 €

I Elevator de chantier 1L01522838425 27,10 €
J Oreillet bleu 1L01522839954 734.56 €
K Housse de couette Bibi 1L01522838521 0,99 €

M Elevator de chantier 1L01522838425 1.00 € Test
N Oreillet bleu 1L01522839954 99,99 € small test
O Housse de couette Bibi 1L01522838521 57.34 € A Test

The regex S/R, above, would get the final text, below :

REMARKS :

If the price is present, only once, it doesn’t matter : The price will be displayed
If some text is present, after the last Euro character , it will be deleted, too

If you prefer to keep the pure blank lines, use, instead :

SEARCH (?-s)^.+ (.+) €.*|^(?!.+€).+\R

REPLACE ?1\1

And you’ll obtain the changed text, below :

Cheers,

guy038