Regex: how to remove spaces that are not followed by letters or numbers
-
hello. I looking for a regex that remove spaces that are not followed by letters or numbers. For example:
s earch
will become
search
So, basically, I have more files with many spaces (usually one space) in words. I want to join these spaces, but only those space that are not followed by letters or numbers, so not to join all the other words between them.
-
I’m confused, your example shows what should be done but is exactly what you don’t want to do as described.
e follows the space, e is a letter. What do I miss?
???Cheers
Claudia -
hello Claudia, yes, I am sorry, I tried to delete or modify the title, but was too late.
Anyway, the example was good.
s earch
will become
search
See this little sentence to see my problem. I want to eliminate the space from the interior of words, without joining the other words.
focu s on the opposition be tween the apparent simplicity of prod ucing a distinct creation and its ulterior comple xity
-
…and j ust ho w in the heck is the reg ex eng ine suppose d to know w hich space is a space betw een words and which spac e is a space ins ide a wor d?
:-D
-
hello Scott. I really don’t know…
-
Hello Vasile,
Scott is perfectly right about it !! How the regex engine could guees, for instance, than the two words be and tween are finally the single word between ? And anyway, the first word be is quite a valid English word, isn’t it ?
So I would advice you to use, rather, a Spell-checker plugin, which will, automatically, highlight all the non-correct words of your text
Best Regards,
guy038
-
-
where did the original text come from?
how were the spaces introduced into the text?
maybe there is a pattern that can be used to identify the “bad” spaces
-
hello, Js. This is what I am trying to figure out.
Right now I am trying some combination like this one:
Search:
\x20?\x20
Replace by:$1
Is not quite very good, but I have time to check more things. I got to have luck :D
But I need to make some kind of a connection with a dictionary, with a spell check.
-
you have not answered the question.
where did the text come from? … was it downloaded from website?
how were the spaces introduced?
were there other characters in the text, like html tags that someone incorrectly deleted by replacing tags with spaces?
-
oh, sure. The text was made after using a pdf to txt converter.
-
ouch!!
try opening the original pdf file using Notepad++
you may get lucky, and the contents may be cleartext (not compressed)otherwise, try a different pdf to txt converter or try ocr software
good luck