Delete most of the text of the file
-
Hello,
I have a .txt file that is like that:
1 Elevator de chantier 1L01522838425 110,30 € 110,30 €
1 Oreillet bleu 1L01522839954 51.26 € 51.26 €
1 Housse de couette Bibi 1L01522838521 86.20 € 86.20 €I need to only keep the numbers : 110,30 & 51,26 and 86,20.
Do you know how can I do that please ?Thank you :)
-
@guy038
will probably provide a more elegant solution but the following regular expression Find & replace worked for me on your example:Find:
^.*[^\d](\d+[\.,]\d+)[^\d].*$
Replace:\1
I wasn’t sure if you use . or , as separator.
It assumes you want to keep a single xx.yy number in each line. -
Hello Vincent, Gstavi and All
Vincent, assuming that the price always ends each line, in your file, an other syntax of that S/R, could be :
SEARCH :
(?-s)^.+ (.+) €
with a space, before the opening parenthesis an a second space character, after the closing one !REPLACE :
\1
As gstavi, I, also, suppose that you just want one copy of each price !
Let’s see, what happens, with the first example line
1 Elevator de chantier 1L01522838425 110,30 € 110,30 €
Notes :
-
The modifier
(?x-s)
:-
Begins an extended area, which allows the user to split a regex, in several lines, and to add some comments
-
Ensures that the dot,
.
, special character matches a single standard character only and never an EOL one, even if you, previously, checked, by mistake, the . matches newline option !
-
-
Then, from beginning of line,
^
, the part.+
, with a space after the plus sign, matches the longest, non empty, range of standard characters, till a space character => It first matches the string "1 Elevator de chantier 1L01522838425 110,30 € 110,30 " -
However, as the regex engine must, also, matches, afterwards, the same range of characters, till a space, followed by the Euro sign, it needs to backtrack till the space character, which is located before the first number 110,30
-
Now, the part
(.+)
does match the longest, non empty, range of standard characters, till the string " €" ( a space and the euro sign ). As it’s surrounded by round brackets, the final number, 110,30, of each line, is stored as group 1 -
Finally the ending part
€
just match the last two characters ( space + Euro ), literally -
In Replacement, the complete line, without its EOL character(s), is replaced by the final number, only
BTW, you may rewrite this regex, using the PRCE EXTENDED mode, as below :
(?x-s) # Enable the EXTENDED mode and disable the DOTALL behaviour ^.+[ ] # The LONGEST, NON empty, range, of STANDARD characters, ending with a SPACE (.+) # The number to keep, with digits 0-9, the comma and dot separator # = the LONGEST, NON empty, range of STANDARD characters, till the string ' €' [ ]€ # A SPACE, followed by a single **EURO** character
In this mode, any vertical or horizontal blank character, found in the regex, is NOT taken in account ! So, some characters will need to be escaped, with an antislash character. For the space character, you can use the better syntax
[ ]
!Now, here’s the magic :
-
Select these five lines, above, that describe the search regex
-
Open the Replace dialog, by hitting on the
CTRL + H
shortcut -
Set the Regular expression search mode
-
Type the regex
\1
( or$1
), in the Replace with field -
Click, once, the Replace All Button, or several times on the Replace button
Et voilà !
BTW, for American/English people, a small hint about numbers syntax :
The correct English syntax,
12,345.78
, for instance, would be replaced, in French, by the syntax12 345,78
Just note that some other European countries may use a dot, instead of the French space character, to separate each group of thousand’s !Best Regards,
guy038
-
-
Hi,
thank you a lot ! It works very nice :) !
Sometimes there is no price at the end of the line. Just a number like this : 1L01522838521 (it could be others numbers, it’s just an example).
The line stay the same. Could it be possible to delete these all lines ?
Thanks again for your help :) -
Hi, Vincent,
Do you mean that your file may contain lines, as below, that you would like to delete ?
1 Elevator de chantier 1L01522838425 1 Housse de couette Bibi 1L01522838521
If so, a possible regex syntax, for that S/R, would be :
SEARCH
(?-s)^.+ (.+) €.*|^(?!.+€).*\R
, with a space, before the first opening parenthesis an a second space character, after the first closing one !REPLACE
?1\1
Notes :
-
From beginning of the line,
^
:-
The first alternative is almost identical to the search regex of my previous post. I just added the syntax
.*
at the end of the first branch, in order to match all subsequent characters, after the last€
character -
The first part of the second branch of the alternative,
(?!.+€)
is a condition, called a negative look-behind, which means “NO Euro sign, exists, further on, in the current line ?” -
If this condition is TRUE, then, the regex engine matches the second, and main, part, of the second branch,
.*\R
. In other words, all the contents, even empty, of the current line, with its EOL characters
-
-
In replacement, IF group 1 ( the price ) exists, we just rewrite that group 1 ( the value), ELSE we do not rewrite anything => All the current line, with the EOL character(s), included, is deleted
So, from the original text, below :
A Elevator de chantier 1L01522838425 110,30 € 110,30 € C Elevator de chantier 1L01522838425 D Oreillet bleu 1L01522839954 51.26 € 51.26 € F Housse de couette Bibi 1L01522838521 G Housse de couette Bibi 1L01522838521 86.20 € 86.20 € I Elevator de chantier 1L01522838425 27,10 € J Oreillet bleu 1L01522839954 734.56 € K Housse de couette Bibi 1L01522838521 0,99 € M Elevator de chantier 1L01522838425 1.00 € Test N Oreillet bleu 1L01522839954 99,99 € small test O Housse de couette Bibi 1L01522838521 57.34 € A Test
The regex S/R, above, would get the final text, below :
110,30 51.26 86.20 27,10 734.56 0,99 1.00 99,99 57.34
REMARKS :
-
If the price is present, only once, it doesn’t matter : The price will be displayed
-
If some text is present, after the last Euro character , it will be deleted, too
If you prefer to keep the pure blank lines, use, instead :
SEARCH
(?-s)^.+ (.+) €.*|^(?!.+€).+\R
REPLACE
?1\1
And you’ll obtain the changed text, below :
110,30 51.26 86.20 27,10 734.56 0,99 1.00 99,99 57.34
Cheers,
guy038
-