Community
    • Login

    Help with Trimming text-Remove before and after words

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    19 Posts 5 Posters 2.0k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Terry RT
      Terry R @Saltshaker2112
      last edited by

      @Saltshaker2112

      Actually I just tested another regex and got it first go. Hopefully it will help.

      Try ^\d+[- ]*|[- ]* [\(\\[\)\\]:\d ]+$

      Terry

      1 Reply Last reply Reply Quote 1
      • Saltshaker2112S
        Saltshaker2112 @Terry R
        last edited by

        @Terry-R said in Help with Trimming text-Remove before and after words:

        @Saltshaker2112 said in Help with Trimming text-Remove before and after words:

        Is there a way to put in a variable that treats “2112” as a word so that it work keeps that in the line?

        That issue was in the back of my mind and I just hoped there wouldn’t be an instance of such a song title.

        There should be a way, but it might take me a while to think it up. Currently all I can think of is that as soon as a number has been grabbed, no further numbers are allowed to be grabbed, but that wouldn’t work on the trailing number as the : gets in the middle of that. Creating a subexpression for all those ideas though may be tricky.

        So EVERY line has a preceding number that needs removing? Because if it didn’t then it might just be impossible to cater for.

        Terry

        In some cases the setlists do not have times, others do. But unfortunately I have the daunting task of editing over a 1000 of them and each one is unique. But honestly this is not the end of the world. What you have provided is a big help and saves a lot of time editing each line. In this case, it will still leave the line empty so I know I can simply put that back in. So as much as I would love to have that variable, its not a deal breaker for me and I really appreciate your time. Its been a big help. Thank you.

        Terry RT 1 Reply Last reply Reply Quote 0
        • Terry RT
          Terry R @Saltshaker2112
          last edited by

          @Saltshaker2112 said in Help with Trimming text-Remove before and after words:

          So as much as I would love to have that variable, its not a deal breaker for me and I really appreciate your time.

          I’ve been trying to figure out a way to exclude any lines which have no text on them (thus the song is a number). It was getting messy, however I think I have a way. It involves splitting one of my previous regexes. So now there would be 3 steps:

          1. Remove any leading number, ^\d+ *-*
          2. Remove any trailing number with spaces, braces etc, (?!^)[- ]* [\(\\[\)\\]:\d ]+$
          3. Remove leading spaces using Edit, Blank operations, Trim Leading Spaces (this is a built-in menu option).

          However this will still require that there is always a preceding number at the start of a line, otherwise the “number” song title will still be removed.

          Anyway, it’s there for you to play with. Hopefully you have plenty of ideas on how you might handle that edge case. Often we find those edge cases are where it takes the most effort to figure out.

          Good luck
          Terry

          1 Reply Last reply Reply Quote 1
          • Mark OlsonM
            Mark Olson
            last edited by Mark Olson

            Figured it out.

            Try this example text:

            1. Keine Lust- 4:03
            02 - Stairway to Heaven(2:33)
            03 I drink alone [3:40]
            4.3434 - 5:15
            5. 10,000 fists -100:52
            6. 11:11 - 4:53
            7-A twist in the myth-1:43
            

            Replace (?x-s)\d+\.? \h*(?:-\h*)? (.*?) \h* (?:-\h*\d+:\d\d | [\(\[]\d+:\d\d[\)\]]) with $1.

            Relevant documentation: https://npp-user-manual.org/docs/searching/

            Essentially four parts:

            1. Flags: (?x-s) (verbose, . does not match newline
            2. Song number with optional . or - and some space: \d+\.? \h*(?:-\h*)?
            3. (.*?): song name
            4. \h* (?:-\h*\d+:\d\d | [\(\[]\d+:\d\d[\)\]]): optional whitespace, then a dash and a song duration or a song duration enclosed in brackets or parens.
            1 Reply Last reply Reply Quote 2
            • guy038G
              guy038
              last edited by guy038

              Hello, @saltshaker2112, @terry-r, @coises, @mark-olson and All,

              And… here is my version !

              First, I tried to find a fair and complete song list for testing and… guess what ? I found out a list of Beatles songs on GitHub !! Refer to :

              https://github.com/inteligentni/Class-05-Feature-engineering/blob/master/The Beatles songs dataset%2C v1%2C no NAs.csv

              From that list, I simply extracted a list of 27 songs, below, keeping only the 3 columns Rank, Title and Duration :

              |  01  |  A Hard Day's Night                                         |  2:32  |
              |  02  |  12-Bar Original                                            |  2:54  |
              |  03  |  Baby, You're a Rich Man                                    |  3:03  |
              |  04  |  Back in the U.S.S.R.                                       |  2:43  |
              |  05  |  Being for the Benefit of Mr. Kite!                         |  2:37  |
              |  06  |  Christmas Time (Is Here Again)                             |  3:03  |
              |  07  |  Do You Want to Know a Secret?                              |  1:56  |
              |  08  |  Everybody's Got Something to Hide Except Me and My Monkey  |  2:24  |
              |  09  |  Hello, Goodbye                                             |  3:27  |
              |  10  |  Help!                                                      |  2:18  |
              |  11  |  Here, There and Everywhere                                 |  2:25  |
              |  12  |  I Want You (She's So Heavy)                                |  7:47  |
              |  13  |  I'll Follow the Sun                                        |  1:46  |
              |  14  |  I'm Happy Just to Dance with You                           |  1:58  |
              |  15  |  Long, Long, Long                                           |  3:04  |
              |  16  |  Mean Mr. Mustard                                           |  1:06  |
              |  17  |  Ob-La-Di, Ob-La-Da                                         |  3:07  |
              |  18  |  Oh! Darling                                                |  3:26  |
              |  19  |  One After 909                                              |  2:52  |
              |  20  |  P.S. I Love You                                            |  2:06  |
              |  21  |  Rain                                                       |  2:59  |
              |  22  |  Sgt. Pepper's Lonely Hearts Club Band                      |  1:59  |
              |  23  |  She's a Woman                                              |  3:03  |
              |  24  |  There's a Place                                            |  1:49  |
              |  25  |  When I'm Sixty-Four                                        |  2:37  |
              |  26  |  Why Don't We Do It in the Road?                            |  1:42  |
              |  27  |  You Can't Do That                                          |  2:37  |
              

              Then, I changed these lines in order to simulate a bad formatting list, which will be our INPUT text :

              01  |  a Hard Day's Night                     -  2:32  |
                02 12-bar Original |  [2:54]  |
              -  03  |  Baby, You're a Rich Man -  3:03       
                             
              					
              .. 04  |  Back in the U.s.s.R.        [2:43]    
              being for the Benefit of Mr. Kite!    |
              |  0.6  |  Christmas Time (is Here Again) -  3:03
                   07  |  Do You Want to Know a Secret?       [1:56]
              |  08  -  Everybody's Got Something to Hide except Me and my Monkey    /  2:24
              |  09  )  Hello, Goodbye     |  (3:27)    
              |  10 Help! |  (2:18)
              |  11 Here, There and Everywhere | - 2:25  |
              12 I Want You (She's So heavy) |  7:47  |
              
              
              13 I'll Follow the Sun   | - 1:46      
                 14  |  I'm Happy Just to Dance with You |  1:58  |
              15   Long, Long, Long  | - 3:04
              ...16  |  Mean Mr. Mustard [ 1:06]  |
              .. 17  |  Ob-La-Di, Ob-la-Da  [ 3:07]          
              | (18) |  Oh! Darling [ 3:26]
              (19) |  One After 909           ( 2:52) |
                 (20) P.s. I Love You ( 2:06)         
              #21 ---  Rain ................................ 2:59
              
              
              
                [22]  |  Sgt. Pepper's Lonely Hearts Club Band        ( 1:59)
              [23]  |  She's a Woman   |  {3:03}  |
              |  [24]  |  There's a Place |  {1:49}        
              |  25  |  When I'm Sixty-four       |  {2:37}
              
              -  26  -      Why Don't We Do It in the Road?  {1:42}  |
              you Can't Do That {2:37}
              

              With this first regex S/R below, we rewrite only the title of each song, one per line, ignoring the empty lines and the lines with blank chars only :

              • SEARCH (?x-i) ^ [0-9\s\W]+ \h+ | (?: \l \x20 \d+ )? \K \h+ [0-9\h\W]+ $

              • REPLACE Leave EMPTY

              Due the \K syntax, you must use the Replace All button (Do not use the Replace button )

              => 52 occurrences occurred and you should get this temporary text :

              a Hard Day's Night
              12-bar Original
              Baby, You're a Rich Man
              Back in the U.s.s.R.
              being for the Benefit of Mr. Kite!
              Christmas Time (is Here Again)
              Do You Want to Know a Secret?
              Everybody's Got Something to Hide except Me and my Monkey
              Hello, Goodbye
              Help!
              Here, There and Everywhere
              I Want You (She's So heavy)
              I'll Follow the Sun
              I'm Happy Just to Dance with You
              Long, Long, Long
              Mean Mr. Mustard
              Ob-La-Di, Ob-la-Da
              Oh! Darling
              One After 909
              P.s. I Love You
              Rain
              Sgt. Pepper's Lonely Hearts Club Band
              She's a Woman
              There's a Place
              When I'm Sixty-four
              Why Don't We Do It in the Road?
              you Can't Do That
              

              Now, whith this second regex S/R, we rewrite any lowecase letter, following a space, a dot, an opening parenthesis or a dash character, by its uppercase equivalent :

              • SEARCH (?x-i) (?: ^ | (?<= [\x20.(-] ) ) \l

              • REPLACE \u$0

              => 31 occurrences occurred and here is your expected OUTPUT text :

              A Hard Day's Night
              12-Bar Original
              Baby, You're A Rich Man
              Back In The U.S.S.R.
              Being For The Benefit Of Mr. Kite!
              Christmas Time (Is Here Again)
              Do You Want To Know A Secret?
              Everybody's Got Something To Hide Except Me And My Monkey
              Hello, Goodbye
              Help!
              Here, There And Everywhere
              I Want You (She's So Heavy)
              I'll Follow The Sun
              I'm Happy Just To Dance With You
              Long, Long, Long
              Mean Mr. Mustard
              Ob-La-Di, Ob-La-Da
              Oh! Darling
              One After 909
              P.S. I Love You
              Rain
              Sgt. Pepper's Lonely Hearts Club Band
              She's A Woman
              There's A Place
              When I'm Sixty-Four
              Why Don't We Do It In The Road?
              You Can't Do That
              

              Best Regards,

              guy038

              Saltshaker2112S 1 Reply Last reply Reply Quote 3
              • Saltshaker2112S
                Saltshaker2112 @guy038
                last edited by

                @guy038 said in Help with Trimming text-Remove before and after words:

                (?x-i) ^ [0-9\s\W]+ \h+ | (?: \l \x20 \d+ )? \K \h+ [0-9\h\W]+ $

                Thanks!! This works pretty good too. I dont think the other ones worked but I still have the issue with “2112”
                So heres a real setlist:

                01) - Bastille Day  5:19
                02. - Lakeside Park  4:41
                [03] - Bytor And The Snowdog  5:43
                04 - Xanadu  12:06
                05 - A Farewell To Kings  6:35
                06 - Something For Nothing  4:13
                07 - Cygnus X-1  10:22
                01 - Anthem  4:15
                02 - Closer To The Heart  3:35
                03 - 2112  18:23
                04 - Working Man / Fly By Night / In The Mood / Drum Solo  15:16
                05 - Cinderella Man  5:14
                

                Which results in with 2112 missing:
                Bastille Day
                Lakeside Park
                Bytor And The Snowdog
                Xanadu
                A Farewell To Kings
                Something For Nothing
                Cygnus X-1
                Anthem
                Closer To The Heart
                Working Man / Fly By Night / In The Mood / Drum Solo
                Cinderella Man

                Still trying some variables but no luck yet but thank you to all so far. This is awesome work.

                CoisesC 1 Reply Last reply Reply Quote 0
                • CoisesC
                  Coises @Saltshaker2112
                  last edited by Coises

                  @Saltshaker2112 Try this:
                  ^[^\w\r\n]*\d+[^\w\r\n]*([^\r\n]*\w[^\w\h\r\n]*)\h+[^\w\r\n]*\d+:\d+[^\w\r\n]*$
                  using this:
                  \1
                  as the replacement string.

                  For me, it’s easier to match a whole line and use a capture expression (the parenthesized part, which is substituted for the \1 in the replacement) rather than try to figure out how to avoid matching troublesome bits like the 2112.

                  EDIT: Above is still wrong; for example, given:
                  20 (Your Love Has Lifted Me) Higher and Higher (2:30)
                  it loses the opening parenthesis.
                  Make it:
                  ^[^\w\r\n]*\d+[^\w\r\n]*\h([^\r\n]*\w[^\w\h\r\n]*)\h+[^\w\r\n]*\d+:\d+[^\w\r\n]*$
                  with:
                  \1
                  as the replacement string.

                  Saltshaker2112S 1 Reply Last reply Reply Quote 1
                  • Saltshaker2112S
                    Saltshaker2112 @Coises
                    last edited by

                    @Coises

                    Wow, that looks like it did the trick!!! Thank you and thanks you everyone here. I gotta say, all of you guys are awesome and I appreciate this very much. It saves me a lot of time! Thanks again.

                    1 Reply Last reply Reply Quote 0
                    • Mark OlsonM
                      Mark Olson
                      last edited by Mark Olson

                      OK, here’s my master regex that should deal with maximally pathological examples in all the formats you’ve shown me:
                      Replace (?-s)[\[\(]?\d+\.?[\)\]]?\h*(?:-\h*)?(\S.*?\S)\h*(?:-\h*)?[\[\(]?\d+:\d\d[\)\]]? with $1

                      Tested on your setlist, plus the maximally evil song title 11:11 by Rodrigo y Gabriela:

                      10 - 11:11 4:49
                      

                      And thank you, @Saltshaker2112 , for providing us with interesting regex challenges. I have progressed substantially as a regex-er by hanging out in this forum and working on puzzles like this.

                      1 Reply Last reply Reply Quote 0
                      • guy038G
                        guy038
                        last edited by guy038

                        Hi, @saltshaker2112, @terry-r, @coises, @mark-olson, @coises and All,

                        Ah… OK. But, if we have to be less restritive on the text to keep, we must be more restrictive regarding the text to get rid of ! Thus :

                        • The part BEFORE the song’s title, which will be deleted, is :

                          • Any NON-word text followed with a number, followed by anything with a final dash AND, at least, ONE blank char

                          • Any number, up to three digits, possibly preceded with blank chars and followed with, at least, ONE blank char

                        • The part AFTER the song’s title, which will be deleted, is :

                          • At least ONE blank char, followed by any char among ([{<_-, followed by possible space chars, followed with a duration ( \d{1,2}(:)\d{2} ), followed with possible space chars, followed with any char among )]}>_- and finally followed with a combination of blank and new-line chars

                          • This part, which manages possible line-breaks, is then replaced by a single line-break ONLY


                        So, starting with the INPUT text, below :

                        
                        
                        01) - Bastille Day  5:19
                        
                        
                        02. - Lakeside Park [   4:41      ]       
                        [03] - Bytor And The Snowdog  5:43
                        04 - Xanadu  12:06
                        05 - A Farewell To Kings ( 6:35 )
                        Something For Nothing   4:13
                                        
                        
                        							
                        ((07 - Cygnus X-1  10:22
                        01 - Anthem  4:15
                        02- Closer To The Heart 999 -  3:35    -
                        [03  ] - 2112  18:23
                        
                        (  03) - (2112) This Is A Test     [2012 ]  18:23
                        03}} - [  2112  ] This Is An Other Test  2012 <18:23   >
                        04 Working Man / Fly By Night / In The Mood / Drum Solo  _15:16_
                        05 - Cinderella Man  5:14
                        

                        Here is my new version of the first regex S/R, which get a clean list of the song’s titles :

                        • SEARCH (?x) ^ \h* (?: \W* \d+ \W* \h* - | \d{1,3} ) \h+ | \h+ [([{<_-]? \x20* \d{1,2} ( : ) \d{2} \x20* [)]}>_-]? ( \h* \R )+

                        • REPLACE ?1\r\n

                        And you get this OUTPUT text :

                        Bastille Day
                        Lakeside Park
                        Bytor And The Snowdog
                        Xanadu
                        A Farewell To Kings
                        Something For Nothing
                        Cygnus X-1
                        Anthem
                        Closer To The Heart 999
                        2112
                        (2112) This Is A Test     [2012 ]
                        [  2112  ] This Is An Other Test  2012
                        Working Man / Fly By Night / In The Mood / Drum Solo
                        Cinderella Man
                        

                        Hope that it’s the expected one !!


                        Of course, the second regex, regarding case changes, is the same as in my previous post !

                        BR

                        guy038

                        P.S. Note that the simple lines, below :

                        123 789 15:47
                        00 15:47
                        

                        With a song’s title containing less than four digits ONLY, with or without a leading rank, would wongly end up to :

                        15:47
                        03:19
                        

                        I chose the limit of three digits, in order that lines with a leading rank up to three digits, immediately followed by the title, as below, are correctly handled ! Indeed :

                        456 The most beautiful song of all the times (12:53)
                        

                        Would correctly result as :

                        The most beautiful song of all the times
                        
                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors