Regex: how to remove spaces that are not followed by letters or numbers



  • hello. I looking for a regex that remove spaces that are not followed by letters or numbers. For example:

    s earch
    will become
    search

    So, basically, I have more files with many spaces (usually one space) in words. I want to join these spaces, but only those space that are not followed by letters or numbers, so not to join all the other words between them.



  • @Vasile-Caraus

    I’m confused, your example shows what should be done but is exactly what you don’t want to do as described.
    e follows the space, e is a letter. What do I miss?
    ???

    Cheers
    Claudia



  • hello Claudia, yes, I am sorry, I tried to delete or modify the title, but was too late.

    Anyway, the example was good.

    s earch
    will become
    search

    See this little sentence to see my problem. I want to eliminate the space from the interior of words, without joining the other words.

    focu s on the opposition be tween the apparent simplicity of prod ucing a distinct creation and its ulterior comple xity



  • …and j ust ho w in the heck is the reg ex eng ine suppose d to know w hich space is a space betw een words and which spac e is a space ins ide a wor d?

    :-D



  • @Scott-Sumner

    hello Scott. I really don’t know…



  • Hello Vasile,

    Scott is perfectly right about it !! How the regex engine could guees, for instance, than the two words be and tween are finally the single word between ? And anyway, the first word be is quite a valid English word, isn’t it ?

    So I would advice you to use, rather, a Spell-checker plugin, which will, automatically, highlight all the non-correct words of your text

    Best Regards,

    guy038





  • where did the original text come from?

    how were the spaces introduced into the text?

    maybe there is a pattern that can be used to identify the “bad” spaces



  • hello, Js. This is what I am trying to figure out.

    Right now I am trying some combination like this one:

    Search: \x20?\x20
    Replace by: $1

    Is not quite very good, but I have time to check more things. I got to have luck :D

    But I need to make some kind of a connection with a dictionary, with a spell check.



  • you have not answered the question.

    where did the text come from? … was it downloaded from website?

    how were the spaces introduced?

    were there other characters in the text, like html tags that someone incorrectly deleted by replacing tags with spaces?



  • oh, sure. The text was made after using a pdf to txt converter.



  • ouch!!

    try opening the original pdf file using Notepad++
    you may get lucky, and the contents may be cleartext (not compressed)

    otherwise, try a different pdf to txt converter or try ocr software

    good luck


Log in to reply