Community
    • Login

    Help with Trimming text-Remove before and after words

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    19 Posts 5 Posters 2.0k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • CoisesC
      Coises @Saltshaker2112
      last edited by

      @Saltshaker2112 Try:
      \s[0-9\s\W]+$
      — same principle, but anchored at the end instead of the beginning. I added the extra \s at the beginning so that if you have titles like:
      We Always Knew (It Would End This Way) (5:15)
      Jump! (4:30)
      the expression won’t capture the non-word characters at the end of the title.

      Saltshaker2112S 1 Reply Last reply Reply Quote 1
      • Saltshaker2112S
        Saltshaker2112 @Terry R
        last edited by

        @Terry-R said in Help with Trimming text-Remove before and after words:

        @Saltshaker2112 said in Help with Trimming text-Remove before and after words:

        Can anyone help me out with what can remove the rest so I end up with just the words/titles on each line and no spaces at the end?

        How about
        (\s*)?(-\s*)?[\(\[:\d\]\)]+(\s*)?(-\s*)?
        to remove all those extraneous numbers and spaces and - and braces of different sorts.

        As your example was possibly interpreted by the posting engine the 01 was made into a number for a possible list.

        See the FAQ post here on how to correctly show examples so they aren’t modified in posting.

        Terry

        Thanks for the quick response. Yeah it seems like the first number was misinterpreted as a list. In any case, the expression basically puts everything into one line:
        Song Title Second Song Title Another Song Title
        But that is close.

        Terry RT 1 Reply Last reply Reply Quote 0
        • Saltshaker2112S
          Saltshaker2112 @Coises
          last edited by

          @Coises said in Help with Trimming text-Remove before and after words:

          @Saltshaker2112 Try:
          \s[0-9\s\W]+$
          — same principle, but anchored at the end instead of the beginning. I added the extra \s at the beginning so that if you have titles like:
          We Always Knew (It Would End This Way) (5:15)
          Jump! (4:30)
          the expression won’t capture the non-word characters at the end of the title.

          this is close…thanks! By itself it removes everything at the end except for one space. How do I group it with the first expression?

          1 Reply Last reply Reply Quote 0
          • Terry RT
            Terry R @Saltshaker2112
            last edited by Terry R

            @Saltshaker2112 said in Help with Trimming text-Remove before and after words:

            n any case, the expression basically puts everything into one line:
            Song Title Second Song Title Another Song Title
            But that is close.

            Sorry about that. I had intended to have an actual space character where every \s shows and when it is that way it will leave each on it’s own line. I didn’t do a final test with \s. You see the \s also includes newline characters (amongst others).

            Terry

            1 Reply Last reply Reply Quote 0
            • Terry RT
              Terry R @Saltshaker2112
              last edited by

              @Saltshaker2112

              I came up with a neater version. I realised that if the song title contained any numbers in the title it could possibly remove those as well. This latest version ties it to either the start of the line, or the end.

              Try (^)?[\(\\[:\d\\]\)\- ]+(?(1)|$)

              So it starts by looking for a start of line, then grabbing numbers and possible spaces, braces and dashes. If it didn’t find a a start of line then it will grab the numbers, spaces, braces and dashes and expects to find an end of line. The (?(1)|$) is a special conditional subexpression. The first bit states that if the outcome of capture group 1 was true then do nothing else (that’s the portion before the | which is nothing) . If capture group 1 was false it forces the expression to find the end of line, since this is the portion behind the | (else statement).

              Please note there is a space character in the last position in the group.

              Saltshaker2112S 1 Reply Last reply Reply Quote 1
              • Saltshaker2112S
                Saltshaker2112 @Terry R
                last edited by Saltshaker2112

                @Terry-R said in Help with Trimming text-Remove before and after words:

                @Saltshaker2112

                I came up with a neater version. I realised that if the song title contained any numbers in the title it could possibly remove those as well. This latest version ties it to either the start of the line, or the end.

                Try (^)?[\([:\d]\)\- ]+(?(1)|$)

                So it starts by looking for a start of line, then grabbing numbers and possible spaces, braces and dashes. If it didn’t find a a start of line then it will grab the numbers, spaces, braces and dashes and expects to find an end of line. The (?(1)|$) is a special conditional subexpression. The first bit states that if the outcome of capture group 1 was true then do nothing else (that’s the portion before the | which is nothing) . If capture group 1 was false it forces the expression to find the end of line, since this is the portion behind the | (else statement).

                Please note there is a space character in the last position in the group.

                Oh that works pretty well. Wow, thank you! I do have one challenge that I did not take into account. Of course I working on setlists for Rush so one song is 2112. Is there a way to put in a variable that treats “2112” as a word so that it work keeps that in the line?

                Terry RT 2 Replies Last reply Reply Quote 0
                • Terry RT
                  Terry R @Saltshaker2112
                  last edited by

                  @Saltshaker2112 said in Help with Trimming text-Remove before and after words:

                  Is there a way to put in a variable that treats “2112” as a word so that it work keeps that in the line?

                  That issue was in the back of my mind and I just hoped there wouldn’t be an instance of such a song title.

                  There should be a way, but it might take me a while to think it up. Currently all I can think of is that as soon as a number has been grabbed, no further numbers are allowed to be grabbed, but that wouldn’t work on the trailing number as the : gets in the middle of that. Creating a subexpression for all those ideas though may be tricky.

                  So EVERY line has a preceding number that needs removing? Because if it didn’t then it might just be impossible to cater for.

                  Terry

                  Saltshaker2112S 1 Reply Last reply Reply Quote 0
                  • Terry RT
                    Terry R @Saltshaker2112
                    last edited by

                    @Saltshaker2112

                    Actually I just tested another regex and got it first go. Hopefully it will help.

                    Try ^\d+[- ]*|[- ]* [\(\\[\)\\]:\d ]+$

                    Terry

                    1 Reply Last reply Reply Quote 1
                    • Saltshaker2112S
                      Saltshaker2112 @Terry R
                      last edited by

                      @Terry-R said in Help with Trimming text-Remove before and after words:

                      @Saltshaker2112 said in Help with Trimming text-Remove before and after words:

                      Is there a way to put in a variable that treats “2112” as a word so that it work keeps that in the line?

                      That issue was in the back of my mind and I just hoped there wouldn’t be an instance of such a song title.

                      There should be a way, but it might take me a while to think it up. Currently all I can think of is that as soon as a number has been grabbed, no further numbers are allowed to be grabbed, but that wouldn’t work on the trailing number as the : gets in the middle of that. Creating a subexpression for all those ideas though may be tricky.

                      So EVERY line has a preceding number that needs removing? Because if it didn’t then it might just be impossible to cater for.

                      Terry

                      In some cases the setlists do not have times, others do. But unfortunately I have the daunting task of editing over a 1000 of them and each one is unique. But honestly this is not the end of the world. What you have provided is a big help and saves a lot of time editing each line. In this case, it will still leave the line empty so I know I can simply put that back in. So as much as I would love to have that variable, its not a deal breaker for me and I really appreciate your time. Its been a big help. Thank you.

                      Terry RT 1 Reply Last reply Reply Quote 0
                      • Terry RT
                        Terry R @Saltshaker2112
                        last edited by

                        @Saltshaker2112 said in Help with Trimming text-Remove before and after words:

                        So as much as I would love to have that variable, its not a deal breaker for me and I really appreciate your time.

                        I’ve been trying to figure out a way to exclude any lines which have no text on them (thus the song is a number). It was getting messy, however I think I have a way. It involves splitting one of my previous regexes. So now there would be 3 steps:

                        1. Remove any leading number, ^\d+ *-*
                        2. Remove any trailing number with spaces, braces etc, (?!^)[- ]* [\(\\[\)\\]:\d ]+$
                        3. Remove leading spaces using Edit, Blank operations, Trim Leading Spaces (this is a built-in menu option).

                        However this will still require that there is always a preceding number at the start of a line, otherwise the “number” song title will still be removed.

                        Anyway, it’s there for you to play with. Hopefully you have plenty of ideas on how you might handle that edge case. Often we find those edge cases are where it takes the most effort to figure out.

                        Good luck
                        Terry

                        1 Reply Last reply Reply Quote 1
                        • Mark OlsonM
                          Mark Olson
                          last edited by Mark Olson

                          Figured it out.

                          Try this example text:

                          1. Keine Lust- 4:03
                          02 - Stairway to Heaven(2:33)
                          03 I drink alone [3:40]
                          4.3434 - 5:15
                          5. 10,000 fists -100:52
                          6. 11:11 - 4:53
                          7-A twist in the myth-1:43
                          

                          Replace (?x-s)\d+\.? \h*(?:-\h*)? (.*?) \h* (?:-\h*\d+:\d\d | [\(\[]\d+:\d\d[\)\]]) with $1.

                          Relevant documentation: https://npp-user-manual.org/docs/searching/

                          Essentially four parts:

                          1. Flags: (?x-s) (verbose, . does not match newline
                          2. Song number with optional . or - and some space: \d+\.? \h*(?:-\h*)?
                          3. (.*?): song name
                          4. \h* (?:-\h*\d+:\d\d | [\(\[]\d+:\d\d[\)\]]): optional whitespace, then a dash and a song duration or a song duration enclosed in brackets or parens.
                          1 Reply Last reply Reply Quote 2
                          • guy038G
                            guy038
                            last edited by guy038

                            Hello, @saltshaker2112, @terry-r, @coises, @mark-olson and All,

                            And… here is my version !

                            First, I tried to find a fair and complete song list for testing and… guess what ? I found out a list of Beatles songs on GitHub !! Refer to :

                            https://github.com/inteligentni/Class-05-Feature-engineering/blob/master/The Beatles songs dataset%2C v1%2C no NAs.csv

                            From that list, I simply extracted a list of 27 songs, below, keeping only the 3 columns Rank, Title and Duration :

                            |  01  |  A Hard Day's Night                                         |  2:32  |
                            |  02  |  12-Bar Original                                            |  2:54  |
                            |  03  |  Baby, You're a Rich Man                                    |  3:03  |
                            |  04  |  Back in the U.S.S.R.                                       |  2:43  |
                            |  05  |  Being for the Benefit of Mr. Kite!                         |  2:37  |
                            |  06  |  Christmas Time (Is Here Again)                             |  3:03  |
                            |  07  |  Do You Want to Know a Secret?                              |  1:56  |
                            |  08  |  Everybody's Got Something to Hide Except Me and My Monkey  |  2:24  |
                            |  09  |  Hello, Goodbye                                             |  3:27  |
                            |  10  |  Help!                                                      |  2:18  |
                            |  11  |  Here, There and Everywhere                                 |  2:25  |
                            |  12  |  I Want You (She's So Heavy)                                |  7:47  |
                            |  13  |  I'll Follow the Sun                                        |  1:46  |
                            |  14  |  I'm Happy Just to Dance with You                           |  1:58  |
                            |  15  |  Long, Long, Long                                           |  3:04  |
                            |  16  |  Mean Mr. Mustard                                           |  1:06  |
                            |  17  |  Ob-La-Di, Ob-La-Da                                         |  3:07  |
                            |  18  |  Oh! Darling                                                |  3:26  |
                            |  19  |  One After 909                                              |  2:52  |
                            |  20  |  P.S. I Love You                                            |  2:06  |
                            |  21  |  Rain                                                       |  2:59  |
                            |  22  |  Sgt. Pepper's Lonely Hearts Club Band                      |  1:59  |
                            |  23  |  She's a Woman                                              |  3:03  |
                            |  24  |  There's a Place                                            |  1:49  |
                            |  25  |  When I'm Sixty-Four                                        |  2:37  |
                            |  26  |  Why Don't We Do It in the Road?                            |  1:42  |
                            |  27  |  You Can't Do That                                          |  2:37  |
                            

                            Then, I changed these lines in order to simulate a bad formatting list, which will be our INPUT text :

                            01  |  a Hard Day's Night                     -  2:32  |
                              02 12-bar Original |  [2:54]  |
                            -  03  |  Baby, You're a Rich Man -  3:03       
                                           
                            					
                            .. 04  |  Back in the U.s.s.R.        [2:43]    
                            being for the Benefit of Mr. Kite!    |
                            |  0.6  |  Christmas Time (is Here Again) -  3:03
                                 07  |  Do You Want to Know a Secret?       [1:56]
                            |  08  -  Everybody's Got Something to Hide except Me and my Monkey    /  2:24
                            |  09  )  Hello, Goodbye     |  (3:27)    
                            |  10 Help! |  (2:18)
                            |  11 Here, There and Everywhere | - 2:25  |
                            12 I Want You (She's So heavy) |  7:47  |
                            
                            
                            13 I'll Follow the Sun   | - 1:46      
                               14  |  I'm Happy Just to Dance with You |  1:58  |
                            15   Long, Long, Long  | - 3:04
                            ...16  |  Mean Mr. Mustard [ 1:06]  |
                            .. 17  |  Ob-La-Di, Ob-la-Da  [ 3:07]          
                            | (18) |  Oh! Darling [ 3:26]
                            (19) |  One After 909           ( 2:52) |
                               (20) P.s. I Love You ( 2:06)         
                            #21 ---  Rain ................................ 2:59
                            
                            
                            
                              [22]  |  Sgt. Pepper's Lonely Hearts Club Band        ( 1:59)
                            [23]  |  She's a Woman   |  {3:03}  |
                            |  [24]  |  There's a Place |  {1:49}        
                            |  25  |  When I'm Sixty-four       |  {2:37}
                            
                            -  26  -      Why Don't We Do It in the Road?  {1:42}  |
                            you Can't Do That {2:37}
                            

                            With this first regex S/R below, we rewrite only the title of each song, one per line, ignoring the empty lines and the lines with blank chars only :

                            • SEARCH (?x-i) ^ [0-9\s\W]+ \h+ | (?: \l \x20 \d+ )? \K \h+ [0-9\h\W]+ $

                            • REPLACE Leave EMPTY

                            Due the \K syntax, you must use the Replace All button (Do not use the Replace button )

                            => 52 occurrences occurred and you should get this temporary text :

                            a Hard Day's Night
                            12-bar Original
                            Baby, You're a Rich Man
                            Back in the U.s.s.R.
                            being for the Benefit of Mr. Kite!
                            Christmas Time (is Here Again)
                            Do You Want to Know a Secret?
                            Everybody's Got Something to Hide except Me and my Monkey
                            Hello, Goodbye
                            Help!
                            Here, There and Everywhere
                            I Want You (She's So heavy)
                            I'll Follow the Sun
                            I'm Happy Just to Dance with You
                            Long, Long, Long
                            Mean Mr. Mustard
                            Ob-La-Di, Ob-la-Da
                            Oh! Darling
                            One After 909
                            P.s. I Love You
                            Rain
                            Sgt. Pepper's Lonely Hearts Club Band
                            She's a Woman
                            There's a Place
                            When I'm Sixty-four
                            Why Don't We Do It in the Road?
                            you Can't Do That
                            

                            Now, whith this second regex S/R, we rewrite any lowecase letter, following a space, a dot, an opening parenthesis or a dash character, by its uppercase equivalent :

                            • SEARCH (?x-i) (?: ^ | (?<= [\x20.(-] ) ) \l

                            • REPLACE \u$0

                            => 31 occurrences occurred and here is your expected OUTPUT text :

                            A Hard Day's Night
                            12-Bar Original
                            Baby, You're A Rich Man
                            Back In The U.S.S.R.
                            Being For The Benefit Of Mr. Kite!
                            Christmas Time (Is Here Again)
                            Do You Want To Know A Secret?
                            Everybody's Got Something To Hide Except Me And My Monkey
                            Hello, Goodbye
                            Help!
                            Here, There And Everywhere
                            I Want You (She's So Heavy)
                            I'll Follow The Sun
                            I'm Happy Just To Dance With You
                            Long, Long, Long
                            Mean Mr. Mustard
                            Ob-La-Di, Ob-La-Da
                            Oh! Darling
                            One After 909
                            P.S. I Love You
                            Rain
                            Sgt. Pepper's Lonely Hearts Club Band
                            She's A Woman
                            There's A Place
                            When I'm Sixty-Four
                            Why Don't We Do It In The Road?
                            You Can't Do That
                            

                            Best Regards,

                            guy038

                            Saltshaker2112S 1 Reply Last reply Reply Quote 3
                            • Saltshaker2112S
                              Saltshaker2112 @guy038
                              last edited by

                              @guy038 said in Help with Trimming text-Remove before and after words:

                              (?x-i) ^ [0-9\s\W]+ \h+ | (?: \l \x20 \d+ )? \K \h+ [0-9\h\W]+ $

                              Thanks!! This works pretty good too. I dont think the other ones worked but I still have the issue with “2112”
                              So heres a real setlist:

                              01) - Bastille Day  5:19
                              02. - Lakeside Park  4:41
                              [03] - Bytor And The Snowdog  5:43
                              04 - Xanadu  12:06
                              05 - A Farewell To Kings  6:35
                              06 - Something For Nothing  4:13
                              07 - Cygnus X-1  10:22
                              01 - Anthem  4:15
                              02 - Closer To The Heart  3:35
                              03 - 2112  18:23
                              04 - Working Man / Fly By Night / In The Mood / Drum Solo  15:16
                              05 - Cinderella Man  5:14
                              

                              Which results in with 2112 missing:
                              Bastille Day
                              Lakeside Park
                              Bytor And The Snowdog
                              Xanadu
                              A Farewell To Kings
                              Something For Nothing
                              Cygnus X-1
                              Anthem
                              Closer To The Heart
                              Working Man / Fly By Night / In The Mood / Drum Solo
                              Cinderella Man

                              Still trying some variables but no luck yet but thank you to all so far. This is awesome work.

                              CoisesC 1 Reply Last reply Reply Quote 0
                              • CoisesC
                                Coises @Saltshaker2112
                                last edited by Coises

                                @Saltshaker2112 Try this:
                                ^[^\w\r\n]*\d+[^\w\r\n]*([^\r\n]*\w[^\w\h\r\n]*)\h+[^\w\r\n]*\d+:\d+[^\w\r\n]*$
                                using this:
                                \1
                                as the replacement string.

                                For me, it’s easier to match a whole line and use a capture expression (the parenthesized part, which is substituted for the \1 in the replacement) rather than try to figure out how to avoid matching troublesome bits like the 2112.

                                EDIT: Above is still wrong; for example, given:
                                20 (Your Love Has Lifted Me) Higher and Higher (2:30)
                                it loses the opening parenthesis.
                                Make it:
                                ^[^\w\r\n]*\d+[^\w\r\n]*\h([^\r\n]*\w[^\w\h\r\n]*)\h+[^\w\r\n]*\d+:\d+[^\w\r\n]*$
                                with:
                                \1
                                as the replacement string.

                                Saltshaker2112S 1 Reply Last reply Reply Quote 1
                                • Saltshaker2112S
                                  Saltshaker2112 @Coises
                                  last edited by

                                  @Coises

                                  Wow, that looks like it did the trick!!! Thank you and thanks you everyone here. I gotta say, all of you guys are awesome and I appreciate this very much. It saves me a lot of time! Thanks again.

                                  1 Reply Last reply Reply Quote 0
                                  • Mark OlsonM
                                    Mark Olson
                                    last edited by Mark Olson

                                    OK, here’s my master regex that should deal with maximally pathological examples in all the formats you’ve shown me:
                                    Replace (?-s)[\[\(]?\d+\.?[\)\]]?\h*(?:-\h*)?(\S.*?\S)\h*(?:-\h*)?[\[\(]?\d+:\d\d[\)\]]? with $1

                                    Tested on your setlist, plus the maximally evil song title 11:11 by Rodrigo y Gabriela:

                                    10 - 11:11 4:49
                                    

                                    And thank you, @Saltshaker2112 , for providing us with interesting regex challenges. I have progressed substantially as a regex-er by hanging out in this forum and working on puzzles like this.

                                    1 Reply Last reply Reply Quote 0
                                    • guy038G
                                      guy038
                                      last edited by guy038

                                      Hi, @saltshaker2112, @terry-r, @coises, @mark-olson, @coises and All,

                                      Ah… OK. But, if we have to be less restritive on the text to keep, we must be more restrictive regarding the text to get rid of ! Thus :

                                      • The part BEFORE the song’s title, which will be deleted, is :

                                        • Any NON-word text followed with a number, followed by anything with a final dash AND, at least, ONE blank char

                                        • Any number, up to three digits, possibly preceded with blank chars and followed with, at least, ONE blank char

                                      • The part AFTER the song’s title, which will be deleted, is :

                                        • At least ONE blank char, followed by any char among ([{<_-, followed by possible space chars, followed with a duration ( \d{1,2}(:)\d{2} ), followed with possible space chars, followed with any char among )]}>_- and finally followed with a combination of blank and new-line chars

                                        • This part, which manages possible line-breaks, is then replaced by a single line-break ONLY


                                      So, starting with the INPUT text, below :

                                      
                                      
                                      01) - Bastille Day  5:19
                                      
                                      
                                      02. - Lakeside Park [   4:41      ]       
                                      [03] - Bytor And The Snowdog  5:43
                                      04 - Xanadu  12:06
                                      05 - A Farewell To Kings ( 6:35 )
                                      Something For Nothing   4:13
                                                      
                                      
                                      							
                                      ((07 - Cygnus X-1  10:22
                                      01 - Anthem  4:15
                                      02- Closer To The Heart 999 -  3:35    -
                                      [03  ] - 2112  18:23
                                      
                                      (  03) - (2112) This Is A Test     [2012 ]  18:23
                                      03}} - [  2112  ] This Is An Other Test  2012 <18:23   >
                                      04 Working Man / Fly By Night / In The Mood / Drum Solo  _15:16_
                                      05 - Cinderella Man  5:14
                                      

                                      Here is my new version of the first regex S/R, which get a clean list of the song’s titles :

                                      • SEARCH (?x) ^ \h* (?: \W* \d+ \W* \h* - | \d{1,3} ) \h+ | \h+ [([{<_-]? \x20* \d{1,2} ( : ) \d{2} \x20* [)]}>_-]? ( \h* \R )+

                                      • REPLACE ?1\r\n

                                      And you get this OUTPUT text :

                                      Bastille Day
                                      Lakeside Park
                                      Bytor And The Snowdog
                                      Xanadu
                                      A Farewell To Kings
                                      Something For Nothing
                                      Cygnus X-1
                                      Anthem
                                      Closer To The Heart 999
                                      2112
                                      (2112) This Is A Test     [2012 ]
                                      [  2112  ] This Is An Other Test  2012
                                      Working Man / Fly By Night / In The Mood / Drum Solo
                                      Cinderella Man
                                      

                                      Hope that it’s the expected one !!


                                      Of course, the second regex, regarding case changes, is the same as in my previous post !

                                      BR

                                      guy038

                                      P.S. Note that the simple lines, below :

                                      123 789 15:47
                                      00 15:47
                                      

                                      With a song’s title containing less than four digits ONLY, with or without a leading rank, would wongly end up to :

                                      15:47
                                      03:19
                                      

                                      I chose the limit of three digits, in order that lines with a leading rank up to three digits, immediately followed by the title, as below, are correctly handled ! Indeed :

                                      456 The most beautiful song of all the times (12:53)
                                      

                                      Would correctly result as :

                                      The most beautiful song of all the times
                                      
                                      1 Reply Last reply Reply Quote 0
                                      • First post
                                        Last post
                                      The Community of users of the Notepad++ text editor.
                                      Powered by NodeBB | Contributors