Community
    • Login

    Insert string/words above to lines contain numbers

    Scheduled Pinned Locked Moved General Discussion
    20 Posts 4 Posters 1.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Isaac GohI
      Isaac Goh
      last edited by

      I need to convert 5000+ lines as shown below. Basically, need to insert string/word from above to lines containing number below till before the next string/word. Thanks in advanced.

      Convert from this:
      Orange
      50
      20
      apple
      30
      20
      10
      pear
      50
      20
      21
      10

      To this:
      Orange 50
      Orange 20
      apple 30
      apple 20
      apple 10
      pear 50
      pear 20
      pear 21
      pear 10

      1 Reply Last reply Reply Quote 0
      • guy038G
        guy038
        last edited by guy038

        Hello, @isaac-goh and All,

        No problem with regexes ! we’ll need 2 consecutive regex S/R. As usual :

        • Move to the very beginning of your file ( Ctrl + Home )

        • Select the Regular expression search mode

        • Untick all box options

        • Click once only on the Replace All button


        So, from your INPUT text :

        Orange
        50
        20
        apple
        30
        20
        10
        pear
        50
        20
        21
        10
        

        This first regex S/R

        SEARCH ^([\u\l]+\R)((?:\d+\R)+)

        REPLACE \2\1

        will reverse the position of all the articles and their respective quantities :

        50
        20
        Orange
        30
        20
        10
        apple
        50
        20
        21
        10
        pear
        

        Then, this second regex will give the expected OUTPUT text by pre-fixing any quantity with its article name :

        SEARCH ^(\d+\R)(?=(?s:.*?)([\u\l]+))|^[\u\l]+\R

        REPLACE ?1\2\x20\1

        Orange 50
        Orange 20
        apple 30
        apple 20
        apple 10
        pear 50
        pear 20
        pear 21
        pear 10
        

        In addition, these replacements are quite safe. Indeed, performing them, against the OUPUT text only, will not modify anything else ;-))

        Best Regards,

        guy038

        1 Reply Last reply Reply Quote 1
        • Isaac GohI
          Isaac Goh
          last edited by

          Hi guy038,
          Thanks for your prompt post. But I don’t seem to get your post to work

          For the first regexes
          SEARCH ^([\u\l]+\R)((?:\d+\R)+)
          REPLACE \2\1

          The outout is a bit out:
          50
          20
          Orange
          30
          20
          10
          apple
          50
          20
          21
          pear
          10

          Thus the second regexes
          SEARCH ^(\d+\R)(?=(?s:.*?)([\u\l]+))|^[\u\l]+\R
          REPLACE ?1\2\x20\1

          Orange 50
          Orange 20
          apple 30
          apple 20
          apple 10
          pear 50
          pear 20
          pear 21
          10

          PeterJonesP 2 Replies Last reply Reply Quote 0
          • PeterJonesP
            PeterJones @Isaac Goh
            last edited by

            @Isaac-Goh said in Insert string/words above to lines contain numbers:

            But I don’t seem to get your post to work

            That’s because your data doesn’t have a final newline after the 10. Guy’s regex assumes that every line that is a number ends with a newline.

            Your choices are to add a newline to the end before running the two regex, or replace the first regex with ^([\u\l]+\R)((?:\d+(\R|\Z))+)

            ----

            Useful References

            • Please Read Before Posting
            • Template for Search/Replace Questions
            • FAQ: Where to find regular expressions (regex) documentation
            • Notepad++ Online User Manual: Searching/Regex
            1 Reply Last reply Reply Quote 1
            • PeterJonesP
              PeterJones @Isaac Goh
              last edited by

              @Isaac-Goh ,

              Alternatively,

              • First:
                FIND = ^([\u\l]+)\R
                REPLACE = $1\x20
                Replace All

              • Second:
                FIND = ^([\u\l]+)\x20\d+\R\K(\d+)$
                REPLACE = $1\x20$2
                Replace All one or more times until the message says none were replaced

              1 Reply Last reply Reply Quote 1
              • guy038G
                guy038
                last edited by guy038

                Hello, @isaac-goh, @peterjones and All,

                Peter, thanks for your clever alternative solution ! I didn’t think of that,. Of course, it needs as many clicks on the Replace All button than the greatest list of numbers minus 1 !


                Now, your solution to handle a last line without any [CR/]LF , that you gave to @isaac-goh :

                ^([\u\l]+\R)((?:\d+(\R|\Z))+)

                does not suit too :-(

                Indeed, you get this OUTPUT text :

                50
                20
                Orange
                30
                20
                10
                apple
                50
                20
                21
                10pear
                

                So, in summary, in order to solve the @isaac-goh goal :

                • My first regex S/R should have been :

                  • SEARCH ^([\u\l]+\R)((?:\d+(?:\R|(\Z)))+)

                  • REPLACE $2(?3\r\n)$1     or     $2(?3\n)$1 if you use Unix files

                • My second regex S/R is unchanged :

                  • SEARCH ^(\d+\R)(?=(?s:.*?)([\u\l]+))|^[\u\l]+\R

                  • REPLACE ?1$2\x20$1

                Best Regards,

                guy038

                PeterJonesP Isaac GohI 2 Replies Last reply Reply Quote 1
                • PeterJonesP
                  PeterJones @guy038
                  last edited by

                  @guy038 said in Insert string/words above to lines contain numbers:

                  Of course, it needs as many clicks on the Replace All button than the greatest list of numbers minus 1

                  That doesn’t bother me. I’d much rather spend a few extra seconds clicking Replace All a few more times than spend XYZ more minutes making the “perfect” regex that does it all with one click. I’ll put just enough effort into a regex for Notepad++ so that it gets the job done and that I can still understand it (and can hope that I will understand it if I read it again six months down the line when I want to try something similar again). Usually, the extra XYZ minutes to find some “clever trick” just ends up making the regex so complicated that I’d never remember the trick in the future, and might not even be able to see what the regex does by looking at it. And if I cannot see what it does, I’ll never try using it again in the future.

                  Now, your solution to handle a last line without any [CR/]LF , that you gave to @isaac-goh :
                  ^([\u\l]+\R)((?:\d+(\R|\Z))+)
                  does not suit too :-(

                  I could’ve sworn I tried it yesterday when I posted. But when I tried it this morning, I got the same results as you. (My guess is that the final EOL had been added into my document without me noticing when I tried my regex yesterday…) So thanks for that correction.

                  1 Reply Last reply Reply Quote 1
                  • Isaac GohI
                    Isaac Goh @guy038
                    last edited by

                    @guy038

                    Your details and prompt effort are very much appreciated. Thanks a million

                    I realise that if certain lines contain as below. The regex doesn’t seem to work.
                    My apology for not listing it in my initial post. Anyway, thank you guys for the great effort.

                    Orange123
                    50:00
                    20:00
                    apple234
                    30:00
                    20:00
                    10:00
                    pear456
                    50:00
                    20:00
                    21:00
                    10:00

                    1 Reply Last reply Reply Quote 0
                    • guy038G
                      guy038
                      last edited by guy038

                      Hi, @isaac-goh ,

                      Well, as usual , from this INPUT text :

                      Orange123
                      50:00
                      20:00
                      apple234
                      30:00
                      20:00
                      10:00
                      pear456
                      50:00
                      20:00
                      21:00
                      10:00
                      

                      Which OUTPUT do you expect ?

                      BR

                      guy038

                      Isaac GohI 1 Reply Last reply Reply Quote 0
                      • Isaac GohI
                        Isaac Goh @guy038
                        last edited by

                        @guy038
                        Output will be as shown below. Many thanks

                        Orange123 50:00
                        Orange123 20:00
                        apple234 30:00
                        apple234 20:00
                        apple234 10:00
                        pear456 50:00
                        pear456 20:00
                        pear456 21:00
                        pear456 10:00

                        1 Reply Last reply Reply Quote 0
                        • guy038G
                          guy038
                          last edited by guy038

                          Hello, @isaac-goh, @peterjones and All,

                          Ah OK ! So, now, we have to find out something less restrictive, because :

                          • The articles may contain digits or an underscore in their name

                          • The quantities may contain the symbol :

                          In order that the regexes clearly recognize these two entities, I assume that the articles will always begin with an upper or lower letter !


                          So, from this INPUT text :

                          Orange123
                          50:00
                          20:00
                          apple234
                          30:00
                          20:00
                          10:00
                          pear456
                          50:00
                          20:00
                          21:00
                          10:00
                          

                          The first regex S/R to use becomes :

                          • SEARCH ^([\u\l]\w*\R)((?:[\d:]+(?:\R|(\Z)))+)

                          • REPLACE $2(?3\r\n)$1     or     $2(?3\n)$1 if you use Unix files

                          and gives :

                          50:00
                          20:00
                          Orange123
                          30:00
                          20:00
                          10:00
                          apple234
                          50:00
                          20:00
                          21:00
                          10:00
                          pear456
                          

                          Then, the second regex S/R to use becomes :

                          • SEARCH ^([\d:]+\R)(?=(?s:.*?)([\u\l]\w*)$)|^[\u\l]\w*\R

                          • REPLACE ?1$2\x20$1

                          and gives the expected OUTPUT text :

                          Orange123 50:00
                          Orange123 20:00
                          apple234 30:00
                          apple234 20:00
                          apple234 10:00
                          pear456 50:00
                          pear456 20:00
                          pear456 21:00
                          pear456 10:00
                          

                          Jut tell me if additional symbols must appear, apart from the : character, in the quantities !

                          Best Regards,

                          guy038

                          Isaac GohI 1 Reply Last reply Reply Quote 1
                          • Isaac GohI
                            Isaac Goh @guy038
                            last edited by

                            @guy038
                            Yes additional symbols like underscore _ and additional colon : as shown below. Many thanks.

                            Convert from this:
                            Orange123_432
                            50:00:00
                            20:00:00
                            apple234_678
                            30:00:00
                            20:00:00
                            10:00:00
                            pear456_321
                            50:00:00
                            20:00:00
                            21:00:00
                            10:00:00

                            To this:
                            Orange123_432 50:00:00
                            Orange123_432 20:00:00
                            apple234_678 30:00:00
                            apple234_678 20:00:00
                            apple234_678 10:00:00
                            pear456_321 50:00:00
                            pear456_321 20:00:00
                            pear456_321 21:00:00
                            pear456_321 10:00:00

                            Best Regards

                            Isaac GohI 1 Reply Last reply Reply Quote 0
                            • Isaac GohI
                              Isaac Goh @Isaac Goh
                              last edited by

                              My apology as there are more than 3000+ lines. I realised additional symbols like underscore _ additional colon : and alphanumeric as shown below. Many thanks.

                              Convert from this:
                              Orange123_432
                              50:00:00:1e
                              20:00:00:1e
                              apple234_678
                              30:00:00:1a
                              20:00:00:1a
                              10:00:00:1a
                              pear456_321
                              50:00:00:1b
                              20:00:00:1b
                              21:00:00:1b
                              10:00:00:1b

                              To this:
                              Orange123_432 50:00:00:1e
                              Orange123_432 20:00:00:1e
                              apple234_678 30:00:00:1a
                              apple234_678 20:00:00:1a
                              apple234_678 10:00:00:1a
                              pear456_321 50:00:00:1b
                              pear456_321 20:00:00:1b
                              pear456_321 21:00:00:1b
                              pear456_321 10:00:00:1b

                              Best Regards

                              PeterJonesP 1 Reply Last reply Reply Quote 0
                              • PeterJonesP
                                PeterJones @Isaac Goh
                                last edited by

                                @Isaac-Goh ,

                                So what is the rule you use to decide which is the “header” line and which is the “other” lines? Is the rule “if it starts with a letter (uppercase or lowercase), it is a header line; if it starts with a number, it is an ‘other’ line”? Or something else.

                                You keep on changing what you expect. If you do not give a reasonable definition of what you want, you cannot expect that someone would get it right for you.

                                ----

                                Useful References

                                • Please Read Before Posting
                                • Template for Search/Replace Questions
                                • FAQ: Where to find regular expressions (regex) documentation
                                • Notepad++ Online User Manual: Searching/Regex
                                1 Reply Last reply Reply Quote 1
                                • guy038G
                                  guy038
                                  last edited by guy038

                                  Hi, @isaac-goh, @peterjones and All,

                                  Well. So :

                                  • The articles contain letters, may contain digits or an underscore in their name and begin with a letter

                                  • The quantities contains digits, may contain the : symbol and may have a trailing lower-case letter


                                  In this case, use successively these two regex S/R, below :

                                  • SEARCH ^([\u\l]\w*\R)((?:[\d:]+\l?(?:\R|(\Z)))+)

                                  • REPLACE $2(?3\r\n)$1     or     $2(?3\n)$1 if you use Unix files

                                  and :

                                  • SEARCH ^([\d:]+\l?\R)(?=(?s:.*?)^([\u\l]\w*)$)|^[\u\l]\w*\R

                                  • REPLACE ?1$2\x20$1

                                  Which will change the INPUT test :

                                  Orange123_432
                                  50:00:00:1e
                                  20:00:00:1e
                                  apple234_678
                                  30:00:00:1a
                                  20:00:00:1a
                                  10:00:00:1a
                                  pear456_321
                                  50:00:00:1b
                                  20:00:00:1b
                                  21:00:00:1b
                                  10:00:00:1b
                                  

                                  into the expected OUTPUT text :

                                  Orange123_432 50:00:00:1e
                                  Orange123_432 20:00:00:1e
                                  apple234_678 30:00:00:1a
                                  apple234_678 20:00:00:1a
                                  apple234_678 10:00:00:1a
                                  pear456_321 50:00:00:1b
                                  pear456_321 20:00:00:1b
                                  pear456_321 21:00:00:1b
                                  pear456_321 10:00:00:1b
                                  

                                  BR

                                  guy038

                                  Isaac GohI 2 Replies Last reply Reply Quote 0
                                  • Isaac GohI
                                    Isaac Goh @guy038
                                    last edited by

                                    @guy038
                                    Apology for my late reply as I am out of town for almost a week and need to scan thru more than 3000+ lines. This is what I have gathered as the possible longest lines.

                                    1 Reply Last reply Reply Quote 0
                                    • Isaac GohI
                                      Isaac Goh @guy038
                                      last edited by

                                      @guy038

                                      Convert from this:
                                      hq12orange1_HBA0_Max_123_PG1_FA1D30
                                      21:00:00:0b:1f:c1:f6:ee
                                      50:00:09:1a:50:06:1b:1f
                                      10:00:00:10:ca:b1:60:94
                                      H600_654342_3B_hq11apple6_HBA4
                                      50:06:0b:50:12:45:fa:20
                                      21:00:12:40:0d:c2:4b:75
                                      VICTOR_200_BE_E3A1BC00_MAX_123_FA1D1
                                      50:00:15:42:c1:32:21:1a
                                      50:00:19:45:d1:06:14:04

                                      To this:
                                      hq12orange1_HBA0_Max_123_PG1_FA1D30 21:00:00:0b:1f:c1:f6:ee
                                      hq12orange1_HBA0_Max_123_PG1_FA1D30 50:00:09:1a:50:06:1b:1f
                                      hq12orange1_HBA0_Max_123_PG1_FA1D30 10:00:00:10:ca:b1:60:94
                                      H600_654342_3B_hq11apple6_HBA4 50:06:0b:50:12:45:fa:20
                                      H600_654342_3B_hq11apple6_HBA4 21:00:12:40:0d:c2:4b:75
                                      VICTOR_200_BE_E3A1BC00_MAX_123_FA1D1 50:00:15:42:c1:32:21:1a
                                      VICTOR_200_BE_E3A1BC00_MAX_123_FA1D1 50:00:19:45:d1:06:14:04

                                      Best Regards

                                      Alan KilbornA 1 Reply Last reply Reply Quote 0
                                      • Alan KilbornA
                                        Alan Kilborn @Isaac Goh
                                        last edited by

                                        @Isaac-Goh

                                        So this isn’t a “data conversion service”.
                                        The way it works is you get a couple of “freebies” and then that stops and you are expected to learn from what you have been given.
                                        Of course you may ask questions about any small points, but in general you are expected to do your own work.
                                        Also you may show what you’ve tried and explain how it wasn’t successful and ask for hints, but you may no longer ask for full solutions.

                                        1 Reply Last reply Reply Quote 0
                                        • guy038G
                                          guy038
                                          last edited by guy038

                                          Hi, @isaac-goh, @peterjones, @alan-kilborn and All,

                                          Of course, @alan-kilborn’s comments are pertinent and you should investigate a bit in regexes : refer to the USEFUL REFERENCES part at the end of the first post of @peterjones !


                                          Now, out of curiosity, I wanted to know, why you asked me, again. Seemingly, my provided regexes did not match all your cases !

                                          Oh…, everything is clear, now ! Really, @isaac-goh, you should have told me, from the very beginning that :

                                          • The header lines are a range of word characters \w, so digits, letters, whatever its case or underscore )

                                          • Each subsequent line is a range of hexadcimal digits [[:xdigit:]], separated with a colon char

                                          This completly change the regexes, of course !

                                          So from this INPUT text :

                                          hq12orange1_HBA0_Max_123_PG1_FA1D30
                                          21:00:00:0b:1f:c1:f6:ee
                                          50:00:09:1a:50:06:1b:1f
                                          10:00:00:10:ca:b1:60:94
                                          H600_654342_3B_hq11apple6_HBA4
                                          50:06:0b:50:12:45:fa:20
                                          21:00:12:40:0d:c2:4b:75
                                          VICTOR_200_BE_E3A1BC00_MAX_123_FA1D1
                                          50:00:15:42:c1:32:21:1a
                                          50:00:19:45:d1:06:14:04
                                          

                                          with this first regex S/R :

                                          • SEARCH ^(\w+\R)((?:[[:xdigit:]:]+(?:\R|(\Z)))+)

                                          • REPLACE $2(?3\r\n)$1     or     $2(?3\n)$1 if you use Unix files

                                          It’ll gives you this text :

                                          21:00:00:0b:1f:c1:f6:ee
                                          50:00:09:1a:50:06:1b:1f
                                          10:00:00:10:ca:b1:60:94
                                          hq12orange1_HBA0_Max_123_PG1_FA1D30
                                          50:06:0b:50:12:45:fa:20
                                          21:00:12:40:0d:c2:4b:75
                                          H600_654342_3B_hq11apple6_HBA4
                                          50:00:15:42:c1:32:21:1a
                                          50:00:19:45:d1:06:14:04
                                          VICTOR_200_BE_E3A1BC00_MAX_123_FA1D1
                                          

                                          Then, with the second regex S/R, below :

                                          • SEARCH ^([[:xdigit:]:]+\R)(?=(?s:.*?)^(\w+)$)|^\w+\R

                                          • REPLACE ?1$2\x20$1

                                          You’ll get the expected OUTPUT text :

                                          hq12orange1_HBA0_Max_123_PG1_FA1D30 21:00:00:0b:1f:c1:f6:ee
                                          hq12orange1_HBA0_Max_123_PG1_FA1D30 50:00:09:1a:50:06:1b:1f
                                          hq12orange1_HBA0_Max_123_PG1_FA1D30 10:00:00:10:ca:b1:60:94
                                          H600_654342_3B_hq11apple6_HBA4 50:06:0b:50:12:45:fa:20
                                          H600_654342_3B_hq11apple6_HBA4 21:00:12:40:0d:c2:4b:75
                                          VICTOR_200_BE_E3A1BC00_MAX_123_FA1D1 50:00:15:42:c1:32:21:1a
                                          VICTOR_200_BE_E3A1BC00_MAX_123_FA1D1 50:00:19:45:d1:06:14:04
                                          

                                          Best Regards,

                                          guy038

                                          Isaac GohI 1 Reply Last reply Reply Quote 1
                                          • Isaac GohI
                                            Isaac Goh @guy038
                                            last edited by

                                            @guy038 said in Insert string/words above to lines contain numbers:

                                            ?1$2\x20$1

                                            Your superb effort is much appreciated. Many thanks.

                                            1 Reply Last reply Reply Quote 1
                                            • First post
                                              Last post
                                            The Community of users of the Notepad++ text editor.
                                            Powered by NodeBB | Contributors