• Login
Community
  • Login

Regex: how to remove spaces that are not followed by letters or numbers

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
12 Posts 6 Posters 5.8k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • V
    Vasile Caraus
    last edited by Vasile Caraus Jan 23, 2017, 2:39 PM Jan 23, 2017, 2:38 PM

    hello. I looking for a regex that remove spaces that are not followed by letters or numbers. For example:

    s earch
    will become
    search

    So, basically, I have more files with many spaces (usually one space) in words. I want to join these spaces, but only those space that are not followed by letters or numbers, so not to join all the other words between them.

    C 1 Reply Last reply Jan 23, 2017, 3:20 PM Reply Quote 0
    • C
      Claudia Frank @Vasile Caraus
      last edited by Jan 23, 2017, 3:20 PM

      @Vasile-Caraus

      I’m confused, your example shows what should be done but is exactly what you don’t want to do as described.
      e follows the space, e is a letter. What do I miss?
      ???

      Cheers
      Claudia

      1 Reply Last reply Reply Quote 0
      • V
        Vasile Caraus
        last edited by Vasile Caraus Jan 23, 2017, 3:44 PM Jan 23, 2017, 3:44 PM

        hello Claudia, yes, I am sorry, I tried to delete or modify the title, but was too late.

        Anyway, the example was good.

        s earch
        will become
        search

        See this little sentence to see my problem. I want to eliminate the space from the interior of words, without joining the other words.

        focu s on the opposition be tween the apparent simplicity of prod ucing a distinct creation and its ulterior comple xity

        1 Reply Last reply Reply Quote 0
        • S
          Scott Sumner
          last edited by Jan 23, 2017, 3:53 PM

          …and j ust ho w in the heck is the reg ex eng ine suppose d to know w hich space is a space betw een words and which spac e is a space ins ide a wor d?

          :-D

          V 1 Reply Last reply Jan 23, 2017, 6:31 PM Reply Quote 2
          • V
            Vasile Caraus @Scott Sumner
            last edited by Jan 23, 2017, 6:31 PM

            @Scott-Sumner

            hello Scott. I really don’t know…

            1 Reply Last reply Reply Quote 0
            • G
              guy038
              last edited by Jan 23, 2017, 9:42 PM

              Hello Vasile,

              Scott is perfectly right about it !! How the regex engine could guees, for instance, than the two words be and tween are finally the single word between ? And anyway, the first word be is quite a valid English word, isn’t it ?

              So I would advice you to use, rather, a Spell-checker plugin, which will, automatically, highlight all the non-correct words of your text

              Best Regards,

              guy038

              1 Reply Last reply Reply Quote 0
              • A
                AdrianHHH
                last edited by Jan 24, 2017, 8:58 AM

                See also http://www.davidpbrown.co.uk/poetry/martha-snow.html

                1 Reply Last reply Reply Quote 3
                • J
                  Js jS
                  last edited by Jan 27, 2017, 4:46 AM

                  where did the original text come from?

                  how were the spaces introduced into the text?

                  maybe there is a pattern that can be used to identify the “bad” spaces

                  1 Reply Last reply Reply Quote 0
                  • V
                    Vasile Caraus
                    last edited by Vasile Caraus Jan 27, 2017, 3:36 PM Jan 27, 2017, 3:34 PM

                    hello, Js. This is what I am trying to figure out.

                    Right now I am trying some combination like this one:

                    Search: \x20?\x20
                    Replace by: $1

                    Is not quite very good, but I have time to check more things. I got to have luck :D

                    But I need to make some kind of a connection with a dictionary, with a spell check.

                    1 Reply Last reply Reply Quote 0
                    • J
                      Js jS
                      last edited by Jan 28, 2017, 1:41 AM

                      you have not answered the question.

                      where did the text come from? … was it downloaded from website?

                      how were the spaces introduced?

                      were there other characters in the text, like html tags that someone incorrectly deleted by replacing tags with spaces?

                      1 Reply Last reply Reply Quote 0
                      • V
                        Vasile Caraus
                        last edited by Jan 28, 2017, 1:20 PM

                        oh, sure. The text was made after using a pdf to txt converter.

                        1 Reply Last reply Reply Quote 0
                        • J
                          Js jS
                          last edited by Js jS Jan 30, 2017, 8:22 PM Jan 30, 2017, 8:22 PM

                          ouch!!

                          try opening the original pdf file using Notepad++
                          you may get lucky, and the contents may be cleartext (not compressed)

                          otherwise, try a different pdf to txt converter or try ocr software

                          good luck

                          1 Reply Last reply Reply Quote 1
                          5 out of 12
                          • First post
                            5/12
                            Last post
                          The Community of users of the Notepad++ text editor.
                          Powered by NodeBB | Contributors