Community
    • Login

    Delete most of the text of the file

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    5 Posts 3 Posters 2.7k Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Vincent TinkV Offline
      Vincent Tink
      last edited by

      Hello,
      I have a .txt file that is like that:
      1 Elevator de chantier 1L01522838425 110,30 € 110,30 €
      1 Oreillet bleu 1L01522839954 51.26 € 51.26 €
      1 Housse de couette Bibi 1L01522838521 86.20 € 86.20 €

      I need to only keep the numbers : 110,30 & 51,26 and 86,20.
      Do you know how can I do that please ?

      Thank you :)

      1 Reply Last reply Reply Quote 0
      • gstaviG Offline
        gstavi
        last edited by

        @guy038
        will probably provide a more elegant solution but the following regular expression Find & replace worked for me on your example:

        Find: ^.*[^\d](\d+[\.,]\d+)[^\d].*$
        Replace: \1

        I wasn’t sure if you use . or , as separator.
        It assumes you want to keep a single xx.yy number in each line.

        1 Reply Last reply Reply Quote 0
        • guy038G Online
          guy038
          last edited by guy038

          Hello Vincent, Gstavi and All

          Vincent, assuming that the price always ends each line, in your file, an other syntax of that S/R, could be :

          SEARCH : (?-s)^.+ (.+) € with a space, before the opening parenthesis an a second space character, after the closing one !

          REPLACE : \1

          As gstavi, I, also, suppose that you just want one copy of each price !


          Let’s see, what happens, with the first example line

          1 Elevator de chantier 1L01522838425 110,30 € 110,30 €
          

          Notes :

          • The modifier (?x-s) :

            • Begins an extended area, which allows the user to split a regex, in several lines, and to add some comments

            • Ensures that the dot, ., special character matches a single standard character only and never an EOL one, even if you, previously, checked, by mistake, the . matches newline option !

          • Then, from beginning of line, ^, the part .+ , with a space after the plus sign, matches the longest, non empty, range of standard characters, till a space character => It first matches the string "1 Elevator de chantier 1L01522838425 110,30 € 110,30 "

          • However, as the regex engine must, also, matches, afterwards, the same range of characters, till a space, followed by the Euro sign, it needs to backtrack till the space character, which is located before the first number 110,30

          • Now, the part (.+) does match the longest, non empty, range of standard characters, till the string " €" ( a space and the euro sign ). As it’s surrounded by round brackets, the final number, 110,30, of each line, is stored as group 1

          • Finally the ending part € just match the last two characters ( space + Euro ), literally

          • In Replacement, the complete line, without its EOL character(s), is replaced by the final number, only


          BTW, you may rewrite this regex, using the PRCE EXTENDED mode, as below :

           (?x-s)  #  Enable the EXTENDED mode and disable the DOTALL behaviour
           ^.+[ ]  #  The LONGEST, NON empty, range, of STANDARD characters, ending with a SPACE
           (.+)    #  The number to keep, with digits 0-9, the comma and dot separator 
                   #      = the LONGEST, NON empty, range of STANDARD characters, till the string ' €'
           [ ]€    #  A SPACE, followed by a single **EURO** character
          

          In this mode, any vertical or horizontal blank character, found in the regex, is NOT taken in account ! So, some characters will need to be escaped, with an antislash character. For the space character, you can use the better syntax [ ] !

          Now, here’s the magic :

          • Select these five lines, above, that describe the search regex

          • Open the Replace dialog, by hitting on the CTRL + H shortcut

          • Set the Regular expression search mode

          • Type the regex \1( or $1 ), in the Replace with field

          • Click, once, the Replace All Button, or several times on the Replace button

          Et voilà !


          BTW, for American/English people, a small hint about numbers syntax :

          The correct English syntax, 12,345.78, for instance, would be replaced, in French, by the syntax 12 345,78 Just note that some other European countries may use a dot, instead of the French space character, to separate each group of thousand’s !

          Best Regards,

          guy038

          1 Reply Last reply Reply Quote 0
          • Vincent TinkV Offline
            Vincent Tink
            last edited by Vincent Tink

            Hi,

            thank you a lot ! It works very nice :) !
            Sometimes there is no price at the end of the line. Just a number like this : 1L01522838521 (it could be others numbers, it’s just an example).
            The line stay the same. Could it be possible to delete these all lines ?
            Thanks again for your help :)

            1 Reply Last reply Reply Quote 0
            • guy038G Online
              guy038
              last edited by guy038

              Hi, Vincent,

              Do you mean that your file may contain lines, as below, that you would like to delete ?

              1 Elevator de chantier 1L01522838425
              1 Housse de couette Bibi 1L01522838521
              

              If so, a possible regex syntax, for that S/R, would be :

              SEARCH (?-s)^.+ (.+) €.*|^(?!.+€).*\R , with a space, before the first opening parenthesis an a second space character, after the first closing one !

              REPLACE ?1\1

              Notes :

              • From beginning of the line, ^ :

                • The first alternative is almost identical to the search regex of my previous post. I just added the syntax .* at the end of the first branch, in order to match all subsequent characters, after the last € character

                • The first part of the second branch of the alternative, (?!.+€) is a condition, called a negative look-behind, which means “NO Euro sign, exists, further on, in the current line ?”

                • If this condition is TRUE, then, the regex engine matches the second, and main, part, of the second branch, .*\R. In other words, all the contents, even empty, of the current line, with its EOL characters

              • In replacement, IF group 1 ( the price ) exists, we just rewrite that group 1 ( the value), ELSE we do not rewrite anything => All the current line, with the EOL character(s), included, is deleted

              So, from the original text, below :

              A Elevator de chantier 1L01522838425 110,30 € 110,30 €
              
              C Elevator de chantier 1L01522838425
              D Oreillet bleu 1L01522839954 51.26 € 51.26 €
              
              F Housse de couette Bibi 1L01522838521
              G Housse de couette Bibi 1L01522838521 86.20 € 86.20 €
              
              I Elevator de chantier 1L01522838425 27,10 €
              J Oreillet bleu 1L01522839954 734.56 €
              K Housse de couette Bibi 1L01522838521 0,99 €
              
              M Elevator de chantier 1L01522838425 1.00 € Test
              N Oreillet bleu 1L01522839954 99,99 € small test
              O Housse de couette Bibi 1L01522838521 57.34 € A Test
              

              The regex S/R, above, would get the final text, below :

              110,30
              51.26
              86.20
              27,10
              734.56
              0,99
              1.00
              99,99
              57.34
              

              REMARKS :

              • If the price is present, only once, it doesn’t matter : The price will be displayed

              • If some text is present, after the last Euro character , it will be deleted, too


              If you prefer to keep the pure blank lines, use, instead :

              SEARCH (?-s)^.+ (.+) €.*|^(?!.+€).+\R

              REPLACE ?1\1

              And you’ll obtain the changed text, below :

              110,30
              
              51.26
              
              86.20
              
              27,10
              734.56
              0,99
              
              1.00
              99,99
              57.34
              

              Cheers,

              guy038

              1 Reply Last reply Reply Quote 0

              Hello! It looks like you're interested in this conversation, but you don't have an account yet.

              Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

              With your input, this post could be even better 💗

              Register Login
              • First post
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors