Community
    • Login

    Delete most of the text of the file

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    5 Posts 3 Posters 2.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Vincent TinkV
      Vincent Tink
      last edited by

      Hello,
      I have a .txt file that is like that:
      1 Elevator de chantier 1L01522838425 110,30 € 110,30 €
      1 Oreillet bleu 1L01522839954 51.26 € 51.26 €
      1 Housse de couette Bibi 1L01522838521 86.20 € 86.20 €

      I need to only keep the numbers : 110,30 & 51,26 and 86,20.
      Do you know how can I do that please ?

      Thank you :)

      1 Reply Last reply Reply Quote 0
      • gstaviG
        gstavi
        last edited by

        @guy038
        will probably provide a more elegant solution but the following regular expression Find & replace worked for me on your example:

        Find: ^.*[^\d](\d+[\.,]\d+)[^\d].*$
        Replace: \1

        I wasn’t sure if you use . or , as separator.
        It assumes you want to keep a single xx.yy number in each line.

        1 Reply Last reply Reply Quote 0
        • guy038G
          guy038
          last edited by guy038

          Hello Vincent, Gstavi and All

          Vincent, assuming that the price always ends each line, in your file, an other syntax of that S/R, could be :

          SEARCH : (?-s)^.+ (.+) € with a space, before the opening parenthesis an a second space character, after the closing one !

          REPLACE : \1

          As gstavi, I, also, suppose that you just want one copy of each price !


          Let’s see, what happens, with the first example line

          1 Elevator de chantier 1L01522838425 110,30 € 110,30 €
          

          Notes :

          • The modifier (?x-s) :

            • Begins an extended area, which allows the user to split a regex, in several lines, and to add some comments

            • Ensures that the dot, ., special character matches a single standard character only and never an EOL one, even if you, previously, checked, by mistake, the . matches newline option !

          • Then, from beginning of line, ^, the part .+ , with a space after the plus sign, matches the longest, non empty, range of standard characters, till a space character => It first matches the string "1 Elevator de chantier 1L01522838425 110,30 € 110,30 "

          • However, as the regex engine must, also, matches, afterwards, the same range of characters, till a space, followed by the Euro sign, it needs to backtrack till the space character, which is located before the first number 110,30

          • Now, the part (.+) does match the longest, non empty, range of standard characters, till the string " €" ( a space and the euro sign ). As it’s surrounded by round brackets, the final number, 110,30, of each line, is stored as group 1

          • Finally the ending part € just match the last two characters ( space + Euro ), literally

          • In Replacement, the complete line, without its EOL character(s), is replaced by the final number, only


          BTW, you may rewrite this regex, using the PRCE EXTENDED mode, as below :

           (?x-s)  #  Enable the EXTENDED mode and disable the DOTALL behaviour
           ^.+[ ]  #  The LONGEST, NON empty, range, of STANDARD characters, ending with a SPACE
           (.+)    #  The number to keep, with digits 0-9, the comma and dot separator 
                   #      = the LONGEST, NON empty, range of STANDARD characters, till the string ' €'
           [ ]€    #  A SPACE, followed by a single **EURO** character
          

          In this mode, any vertical or horizontal blank character, found in the regex, is NOT taken in account ! So, some characters will need to be escaped, with an antislash character. For the space character, you can use the better syntax [ ] !

          Now, here’s the magic :

          • Select these five lines, above, that describe the search regex

          • Open the Replace dialog, by hitting on the CTRL + H shortcut

          • Set the Regular expression search mode

          • Type the regex \1( or $1 ), in the Replace with field

          • Click, once, the Replace All Button, or several times on the Replace button

          Et voilà !


          BTW, for American/English people, a small hint about numbers syntax :

          The correct English syntax, 12,345.78, for instance, would be replaced, in French, by the syntax 12 345,78 Just note that some other European countries may use a dot, instead of the French space character, to separate each group of thousand’s !

          Best Regards,

          guy038

          1 Reply Last reply Reply Quote 0
          • Vincent TinkV
            Vincent Tink
            last edited by Vincent Tink

            Hi,

            thank you a lot ! It works very nice :) !
            Sometimes there is no price at the end of the line. Just a number like this : 1L01522838521 (it could be others numbers, it’s just an example).
            The line stay the same. Could it be possible to delete these all lines ?
            Thanks again for your help :)

            1 Reply Last reply Reply Quote 0
            • guy038G
              guy038
              last edited by guy038

              Hi, Vincent,

              Do you mean that your file may contain lines, as below, that you would like to delete ?

              1 Elevator de chantier 1L01522838425
              1 Housse de couette Bibi 1L01522838521
              

              If so, a possible regex syntax, for that S/R, would be :

              SEARCH (?-s)^.+ (.+) €.*|^(?!.+€).*\R , with a space, before the first opening parenthesis an a second space character, after the first closing one !

              REPLACE ?1\1

              Notes :

              • From beginning of the line, ^ :

                • The first alternative is almost identical to the search regex of my previous post. I just added the syntax .* at the end of the first branch, in order to match all subsequent characters, after the last € character

                • The first part of the second branch of the alternative, (?!.+€) is a condition, called a negative look-behind, which means “NO Euro sign, exists, further on, in the current line ?”

                • If this condition is TRUE, then, the regex engine matches the second, and main, part, of the second branch, .*\R. In other words, all the contents, even empty, of the current line, with its EOL characters

              • In replacement, IF group 1 ( the price ) exists, we just rewrite that group 1 ( the value), ELSE we do not rewrite anything => All the current line, with the EOL character(s), included, is deleted

              So, from the original text, below :

              A Elevator de chantier 1L01522838425 110,30 € 110,30 €
              
              C Elevator de chantier 1L01522838425
              D Oreillet bleu 1L01522839954 51.26 € 51.26 €
              
              F Housse de couette Bibi 1L01522838521
              G Housse de couette Bibi 1L01522838521 86.20 € 86.20 €
              
              I Elevator de chantier 1L01522838425 27,10 €
              J Oreillet bleu 1L01522839954 734.56 €
              K Housse de couette Bibi 1L01522838521 0,99 €
              
              M Elevator de chantier 1L01522838425 1.00 € Test
              N Oreillet bleu 1L01522839954 99,99 € small test
              O Housse de couette Bibi 1L01522838521 57.34 € A Test
              

              The regex S/R, above, would get the final text, below :

              110,30
              51.26
              86.20
              27,10
              734.56
              0,99
              1.00
              99,99
              57.34
              

              REMARKS :

              • If the price is present, only once, it doesn’t matter : The price will be displayed

              • If some text is present, after the last Euro character , it will be deleted, too


              If you prefer to keep the pure blank lines, use, instead :

              SEARCH (?-s)^.+ (.+) €.*|^(?!.+€).+\R

              REPLACE ?1\1

              And you’ll obtain the changed text, below :

              110,30
              
              51.26
              
              86.20
              
              27,10
              734.56
              0,99
              
              1.00
              99,99
              57.34
              

              Cheers,

              guy038

              1 Reply Last reply Reply Quote 0
              • First post
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors