Community
    • Login

    Add a space between a word and a hyphen stuck to its right side, as well as skip such instances in other parts

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    7 Posts 3 Posters 204 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • dr ramaanandD
      dr ramaanand
      last edited by dr ramaanand

      Block of text for testing:-

      <html lang="en">
      <head>
      <meta http-equiv="Content- Type" content="text/html; charset=utf-8" />
      <meta http-equiv="X-UA-Compatible" content="IE=edge" />
      <META name="viewport" content="width=device-width, initial-scale=1" />
      <h1>BOTHROPS</h1>
      <p style="color: black; font-family: Verdana,sans-serif; font-size: 18px; font-style: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; display: inline ! important; float: none;">BOTHROPS LANCEOLATUS uses [Both-l uses]</p>
      Haemorrhages- dark
      Fear- of death
      E-mail us
      <h6>Remedies A- Z</h6>
      <ul>
      Some- list- here
      Dunking- donuts
      Seventytwo- houris
      </ul>
      <style type="text/css">
      @media (min- width: 1281px) {
      .left {
       width: 180px;
       border-width:1px;
       border-style:solid;
       border-color:lightblue;
       padding-top:10px;
      }
      .right {
       width: 560px;
       border- width:1px;
       border- style:solid;
       border- color:lightblue;
       margin- top:0px;
      }
      }
      </style>
      <script type="text/javascript">
      function googleTranslateElementInit() {
       new google.translate.TranslateElement({pageLanguage: 'en'}, 'google- translate- element');
      }
      </script>
      

      I tried (<html[\S\s]*?<\/h1>)(*SKIP)(*F)|(<p[^>]*>[\S\s]*?uses\]<\/p>)(*SKIP)(*F)|(<[\S\s]*?>)(*SKIP)(*F)|(E-mail)(*SKIP)(*F)|(<h6[^<>]*>.*?<\/h6>)(*SKIP)(*F)|(A-Z)(*SKIP)(*F)|(<ul.*?<\/ul>)(*SKIP)(*F)|(2009-2024)(*SKIP)(*F)|(<style[\S\s]*?<\/style>)(*SKIP)(*F)|(<script[\S\s]*?<\/script>)(*SKIP)(*F)|(\w+)-(\x20\w+) with $1 - $2 in the replace field to no avail

      dr ramaanandD CoisesC 2 Replies Last reply Reply Quote 0
      • dr ramaanandD
        dr ramaanand @dr ramaanand
        last edited by dr ramaanand

        How to add a space between a word and a hyphen stuck to its right side, as well as skip such instances in other parts :-
        The resultant output should be

        Haemorrhages - dark
        Fear - of death
        
        1 Reply Last reply Reply Quote 0
        • CoisesC
          Coises @dr ramaanand
          last edited by Coises

          @dr-ramaanand said in Add a space between a word and a hyphen stuck to its right side, as well as skip such instances in other parts:

          I tried (<html[\S\s]*?<\/h1>)(*SKIP)(*F)|(<p[^>]*>[\S\s]*?uses\]<\/p>)(*SKIP)(*F)|(<[\S\s]*?>)(*SKIP)(*F)|(E-mail)(*SKIP)(*F)|(<h6[^<>]*>.*?<\/h6>)(*SKIP)(*F)|(A-Z)(*SKIP)(*F)|(<ul.*?<\/ul>)(*SKIP)(*F)|(2009-2024)(*SKIP)(*F)|(<style[\S\s]*?<\/style>)(*SKIP)(*F)|(<script[\S\s]*?<\/script>)(*SKIP)(*F)|(\w+)-(\x20\w+) with $1 - $2 in the replace field to no avail

          Two obvious things:

          (<[\S\s]*?>)(*SKIP)(*F) in your exclusions always matches everything to the end of the document and then fails, so it excludes everything. Take that out.

          You have a lot of capturing groups, so $1 - $2 isn’t going to work. Less troublesome would be to replace (\w+)-(\x20\w+) with (?<=\w)-(?=\s); then you can replace with x20- and not worry about capture groups at all.

          Also, some tests won’t work unless . matches newline is checked, or you add (?s) to the beginning.

          This:

          Find what: (?s)(<html[\S\s]*?<\/h1>)(*SKIP)(*F)|(<p[^>]*>[\S\s]*?uses\]<\/p>)(*SKIP)(*F)|(<h6[^<>]*>.*?<\/h6>)(*SKIP)(*F)|(<ul.*?<\/ul>)(*SKIP)(*F)|(<style[\S\s]*?<\/style>)(*SKIP)(*F)|(<script[\S\s]*?<\/script>)(*SKIP)(*F)|(?<=\w)-(?=\s)

          Replace with: \x20-

          works on your test data.

          dr ramaanandD 1 Reply Last reply Reply Quote 0
          • dr ramaanandD
            dr ramaanand @Coises
            last edited by dr ramaanand

            @Coises Thanks a lot. I also got two more solutions from someone at www.regex101.com which is to use a Regular expression.
            One solution was to use this in the Find field:-

            (?x)(<html[\S\s]*?<\/h1>)(*SKIP)(*F)|(<p[^>]*>[\S\s]*?uses\]<\/p>)(*SKIP)(*F)|(<[\S\s]*?>)(*SKIP)(*F)|(E-mail)(*SKIP)(*F)|(<h6[^<>]*>.*?<\/h6>)(*SKIP)(*F)|(A-Z)(*SKIP)(*F)|(<a\s[^>]*href.*?<\/a>)(*SKIP)(*F)|(2009-2024)(*SKIP)(*F)|(<style[\S\s]*?<\/style>)(*SKIP)(*F)|(<script[\S\s]*?<\/script>)(*SKIP)(*F)|(\w+)-\x20\b
            

            with $11 - $12 in the Replace field

            Another was to use this in the Find field:-

            (?x)(<html[\S\s]*?<\/h1>)(*SKIP)(*F)|(<p[^>]*>[\S\s]*?uses\]<\/p>)(*SKIP)(*F)|(<[\S\s]*?>)(*SKIP)(*F)|(E-mail)(*SKIP)(*F)|(<h6[^<>]*>.*?<\/h6>)(*SKIP)(*F)|(A-Z)(*SKIP)(*F)|(<a\s[^>]*href.*?<\/a>)(*SKIP)(*F)|(2009-2024)(*SKIP)(*F)|(<style[\S\s]*?<\/style>)(*SKIP)(*F)|(<script[\S\s]*?<\/script>)(*SKIP)(*F)|\w+\K-\x20\b
            

            with - in the Replace field

            dr ramaanandD 1 Reply Last reply Reply Quote 0
            • dr ramaanandD
              dr ramaanand @dr ramaanand
              last edited by

              @Coises I am posting those solutions here so that someone may find it useful, later (since this webpage can be found online)

              dr ramaanandD 1 Reply Last reply Reply Quote 0
              • dr ramaanandD
                dr ramaanand @dr ramaanand
                last edited by dr ramaanand

                Warning note: Wherever the RegExes, that is, regular expressions mentioned above did not find anything, it replaced everything with what was typed in the, “Replace” field. I therefore restored everything from a back-up, added, “Czeslawski- Lewinski” in a part that was not skipped while searching and made the replacements; I then removed the, “Czeslawski- Lewinski”. I chose those words (Polish-American names actually) because they are unique

                1 Reply Last reply Reply Quote 0
                • guy038G
                  guy038
                  last edited by

                  Hello, @dr-ramaanand, @coises and All,

                  I tried to simplify the @coises search regex and I ended up with this search regex :

                  (?s-i)(<(.+?)[> ].*?(?:/>|</\2>))(*SKIP)(*F)|(?-s).+\R

                  So, given your INPUT text :

                  <html lang="en">
                  <head>
                  <meta http-equiv="Content- Type" content="text/html; charset=utf-8" />
                  <meta http-equiv="X-UA-Compatible" content="IE=edge" />
                  <META name="viewport" content="width=device-width, initial-scale=1" />
                  <h1>BOTHROPS</h1>
                  <p style="color: black; font-family: Verdana,sans-serif; font-size: 18px; font-style: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; display: inline ! important; float: none;">BOTHROPS LANCEOLATUS uses [Both-l uses]</p>
                  Haemor- rhages- dark
                  Fear- of death
                  E-mail us
                  <h6>Remedies A- Z</h6>
                  <ul>
                  Some- list- here
                  Dunking- donuts
                  Seventytwo- houris
                  </ul>
                  <style type="text/css">
                  @media (min- width: 1281px) {
                  .left {
                  
                  
                   width: 180px;
                   border-width:1px;
                   border-style:solid;
                   border-color:lightblue;
                   padding-top:10px;
                  }
                  .right {
                   width: 560px;
                   border- width:1px;
                   border- style:solid;
                   border- color:lightblue;
                   margin- top:0px;
                  }
                  }
                  </style>
                  <script type="text/javascript">
                  function googleTranslateElementInit() {
                   new google.translate.TranslateElement({pageLanguage: 'en'}, 'google- translate- element');
                  }
                  </script>
                  

                  This regex just matches the three consecutive lines, below :

                  Haemor- rhages- dark
                  Fear- of death
                  E-mail us
                  

                  Note that I deliberately added an other string r-, followed with a space character, for tests !


                  Thus, the following regex S/R :

                  SEARCH (?s-i)(<(.+?)[> ].*?(?:/>|</\2>))(*SKIP)(*F)|(?<=\w)-(?=\x20)

                  REPLACE \x20-

                  Will replace, in these three lines ONLY, any string letter-, followed with a space char, with the string letter - and a space char

                  Best Regards,

                  guy038

                  1 Reply Last reply Reply Quote 0
                  • First post
                    Last post
                  The Community of users of the Notepad++ text editor.
                  Powered by NodeBB | Contributors