Community
    • Login

    Add a space between a word and a hyphen stuck to its right side, as well as skip such instances in other parts

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    7 Posts 3 Posters 752 Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • dr ramaanandD Offline
      dr ramaanand
      last edited by dr ramaanand

      Block of text for testing:-

      <html lang="en">
      <head>
      <meta http-equiv="Content- Type" content="text/html; charset=utf-8" />
      <meta http-equiv="X-UA-Compatible" content="IE=edge" />
      <META name="viewport" content="width=device-width, initial-scale=1" />
      <h1>BOTHROPS</h1>
      <p style="color: black; font-family: Verdana,sans-serif; font-size: 18px; font-style: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; display: inline ! important; float: none;">BOTHROPS LANCEOLATUS uses [Both-l uses]</p>
      Haemorrhages- dark
      Fear- of death
      E-mail us
      <h6>Remedies A- Z</h6>
      <ul>
      Some- list- here
      Dunking- donuts
      Seventytwo- houris
      </ul>
      <style type="text/css">
      @media (min- width: 1281px) {
      .left {
       width: 180px;
       border-width:1px;
       border-style:solid;
       border-color:lightblue;
       padding-top:10px;
      }
      .right {
       width: 560px;
       border- width:1px;
       border- style:solid;
       border- color:lightblue;
       margin- top:0px;
      }
      }
      </style>
      <script type="text/javascript">
      function googleTranslateElementInit() {
       new google.translate.TranslateElement({pageLanguage: 'en'}, 'google- translate- element');
      }
      </script>
      

      I tried (<html[\S\s]*?<\/h1>)(*SKIP)(*F)|(<p[^>]*>[\S\s]*?uses\]<\/p>)(*SKIP)(*F)|(<[\S\s]*?>)(*SKIP)(*F)|(E-mail)(*SKIP)(*F)|(<h6[^<>]*>.*?<\/h6>)(*SKIP)(*F)|(A-Z)(*SKIP)(*F)|(<ul.*?<\/ul>)(*SKIP)(*F)|(2009-2024)(*SKIP)(*F)|(<style[\S\s]*?<\/style>)(*SKIP)(*F)|(<script[\S\s]*?<\/script>)(*SKIP)(*F)|(\w+)-(\x20\w+) with $1 - $2 in the replace field to no avail

      dr ramaanandD CoisesC 2 Replies Last reply Reply Quote 0
      • dr ramaanandD Offline
        dr ramaanand @dr ramaanand
        last edited by dr ramaanand

        How to add a space between a word and a hyphen stuck to its right side, as well as skip such instances in other parts :-
        The resultant output should be

        Haemorrhages - dark
        Fear - of death
        
        1 Reply Last reply Reply Quote 0
        • CoisesC Offline
          Coises @dr ramaanand
          last edited by Coises

          @dr-ramaanand said in Add a space between a word and a hyphen stuck to its right side, as well as skip such instances in other parts:

          I tried (<html[\S\s]*?<\/h1>)(*SKIP)(*F)|(<p[^>]*>[\S\s]*?uses\]<\/p>)(*SKIP)(*F)|(<[\S\s]*?>)(*SKIP)(*F)|(E-mail)(*SKIP)(*F)|(<h6[^<>]*>.*?<\/h6>)(*SKIP)(*F)|(A-Z)(*SKIP)(*F)|(<ul.*?<\/ul>)(*SKIP)(*F)|(2009-2024)(*SKIP)(*F)|(<style[\S\s]*?<\/style>)(*SKIP)(*F)|(<script[\S\s]*?<\/script>)(*SKIP)(*F)|(\w+)-(\x20\w+) with $1 - $2 in the replace field to no avail

          Two obvious things:

          (<[\S\s]*?>)(*SKIP)(*F) in your exclusions always matches everything to the end of the document and then fails, so it excludes everything. Take that out.

          You have a lot of capturing groups, so $1 - $2 isn’t going to work. Less troublesome would be to replace (\w+)-(\x20\w+) with (?<=\w)-(?=\s); then you can replace with x20- and not worry about capture groups at all.

          Also, some tests won’t work unless . matches newline is checked, or you add (?s) to the beginning.

          This:

          Find what: (?s)(<html[\S\s]*?<\/h1>)(*SKIP)(*F)|(<p[^>]*>[\S\s]*?uses\]<\/p>)(*SKIP)(*F)|(<h6[^<>]*>.*?<\/h6>)(*SKIP)(*F)|(<ul.*?<\/ul>)(*SKIP)(*F)|(<style[\S\s]*?<\/style>)(*SKIP)(*F)|(<script[\S\s]*?<\/script>)(*SKIP)(*F)|(?<=\w)-(?=\s)

          Replace with: \x20-

          works on your test data.

          dr ramaanandD 1 Reply Last reply Reply Quote 0
          • dr ramaanandD Offline
            dr ramaanand @Coises
            last edited by dr ramaanand

            @Coises Thanks a lot. I also got two more solutions from someone at www.regex101.com which is to use a Regular expression.
            One solution was to use this in the Find field:-

            (?x)(<html[\S\s]*?<\/h1>)(*SKIP)(*F)|(<p[^>]*>[\S\s]*?uses\]<\/p>)(*SKIP)(*F)|(<[\S\s]*?>)(*SKIP)(*F)|(E-mail)(*SKIP)(*F)|(<h6[^<>]*>.*?<\/h6>)(*SKIP)(*F)|(A-Z)(*SKIP)(*F)|(<a\s[^>]*href.*?<\/a>)(*SKIP)(*F)|(2009-2024)(*SKIP)(*F)|(<style[\S\s]*?<\/style>)(*SKIP)(*F)|(<script[\S\s]*?<\/script>)(*SKIP)(*F)|(\w+)-\x20\b
            

            with $11 - $12 in the Replace field

            Another was to use this in the Find field:-

            (?x)(<html[\S\s]*?<\/h1>)(*SKIP)(*F)|(<p[^>]*>[\S\s]*?uses\]<\/p>)(*SKIP)(*F)|(<[\S\s]*?>)(*SKIP)(*F)|(E-mail)(*SKIP)(*F)|(<h6[^<>]*>.*?<\/h6>)(*SKIP)(*F)|(A-Z)(*SKIP)(*F)|(<a\s[^>]*href.*?<\/a>)(*SKIP)(*F)|(2009-2024)(*SKIP)(*F)|(<style[\S\s]*?<\/style>)(*SKIP)(*F)|(<script[\S\s]*?<\/script>)(*SKIP)(*F)|\w+\K-\x20\b
            

            with - in the Replace field

            dr ramaanandD 1 Reply Last reply Reply Quote 0
            • dr ramaanandD Offline
              dr ramaanand @dr ramaanand
              last edited by

              @Coises I am posting those solutions here so that someone may find it useful, later (since this webpage can be found online)

              dr ramaanandD 1 Reply Last reply Reply Quote 0
              • dr ramaanandD Offline
                dr ramaanand @dr ramaanand
                last edited by dr ramaanand

                Warning note: Wherever the RegExes, that is, regular expressions mentioned above did not find anything, it replaced everything with what was typed in the, “Replace” field. I therefore restored everything from a back-up, added, “Czeslawski- Lewinski” in a part that was not skipped while searching and made the replacements; I then removed the, “Czeslawski- Lewinski”. I chose those words (Polish-American names actually) because they are unique

                1 Reply Last reply Reply Quote 0
                • guy038G Offline
                  guy038
                  last edited by

                  Hello, @dr-ramaanand, @coises and All,

                  I tried to simplify the @coises search regex and I ended up with this search regex :

                  (?s-i)(<(.+?)[> ].*?(?:/>|</\2>))(*SKIP)(*F)|(?-s).+\R

                  So, given your INPUT text :

                  <html lang="en">
                  <head>
                  <meta http-equiv="Content- Type" content="text/html; charset=utf-8" />
                  <meta http-equiv="X-UA-Compatible" content="IE=edge" />
                  <META name="viewport" content="width=device-width, initial-scale=1" />
                  <h1>BOTHROPS</h1>
                  <p style="color: black; font-family: Verdana,sans-serif; font-size: 18px; font-style: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; display: inline ! important; float: none;">BOTHROPS LANCEOLATUS uses [Both-l uses]</p>
                  Haemor- rhages- dark
                  Fear- of death
                  E-mail us
                  <h6>Remedies A- Z</h6>
                  <ul>
                  Some- list- here
                  Dunking- donuts
                  Seventytwo- houris
                  </ul>
                  <style type="text/css">
                  @media (min- width: 1281px) {
                  .left {
                  
                  
                   width: 180px;
                   border-width:1px;
                   border-style:solid;
                   border-color:lightblue;
                   padding-top:10px;
                  }
                  .right {
                   width: 560px;
                   border- width:1px;
                   border- style:solid;
                   border- color:lightblue;
                   margin- top:0px;
                  }
                  }
                  </style>
                  <script type="text/javascript">
                  function googleTranslateElementInit() {
                   new google.translate.TranslateElement({pageLanguage: 'en'}, 'google- translate- element');
                  }
                  </script>
                  

                  This regex just matches the three consecutive lines, below :

                  Haemor- rhages- dark
                  Fear- of death
                  E-mail us
                  

                  Note that I deliberately added an other string r-, followed with a space character, for tests !


                  Thus, the following regex S/R :

                  SEARCH (?s-i)(<(.+?)[> ].*?(?:/>|</\2>))(*SKIP)(*F)|(?<=\w)-(?=\x20)

                  REPLACE \x20-

                  Will replace, in these three lines ONLY, any string letter-, followed with a space char, with the string letter - and a space char

                  Best Regards,

                  guy038

                  1 Reply Last reply Reply Quote 0

                  Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                  Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                  With your input, this post could be even better 💗

                  Register Login
                  • First post
                    Last post
                  The Community of users of the Notepad++ text editor.
                  Powered by NodeBB | Contributors