Regex: How to turn more words in the line as if they were seen in the mirror?

guy038

Hi, @robin-cruise, @alan-kilborn, @hellena-crainicu and all,

First, @robin-cruise, as suggested by @alan-kilborn, just execute each step, one after another and the replacement will occur ;-))

Now, let’s suppose that we want to consider expressions like middle-aged or I'm or John's as single words ! So, we must include the dash ( - ), the apostrophe ( ' ) and the right single quotation mark ( ’ ) as word characters

So each single character to find can be expressed with the character class [\w'’-]

Now, in order to look for the consecutive letters of a word, we’ll use the general syntax ([\w'’-])((?1))((?1))•••••((?1)) where (?1) is a subroutine call to group 1, so [\w'’-] and the outer parentheses, of each (?1), stores the corresponding character as groups 2, 3, and so on …

So, here is, below, a regex S/R, which deals with expressions from 2 to 24 letters, only. It’s our best solution as the overall regex cannot exceed 2,048 characters !

SEARCH :

(?x)
(?|
([\w'’-])((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1)) |
([\w'’-])((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1)) |
([\w'’-])((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1)) |
([\w'’-])((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1)) |
([\w'’-])((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1)) |
([\w'’-])((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1)) |
([\w'’-])((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1)) |
([\w'’-])((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1)) |
([\w'’-])((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1)) |
([\w'’-])((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1)) |
([\w'’-])((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1)) |
([\w'’-])((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1)) |
([\w'’-])((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1)) |
([\w'’-])((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1)) |
([\w'’-])((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1)) |
([\w'’-])((?1))((?1))((?1))((?1))((?1))((?1))((?1))((?1)) |
([\w'’-])((?1))((?1))((?1))((?1))((?1))((?1))((?1)) |
([\w'’-])((?1))((?1))((?1))((?1))((?1))((?1)) |
([\w'’-])((?1))((?1))((?1))((?1))((?1)) |
([\w'’-])((?1))((?1))((?1))((?1)) |
([\w'’-])((?1))((?1))((?1)) |
([\w'’-])((?1))((?1)) |
([\w'’-])((?1))
)

REPLACE :

$24$23$22$21$20$19$18$17$16$15$14$13$12$11$10$9$8$7$6$5$4$3$2$1

So this INPUT text :

Mary's brother is a middle-aged person

will be changed as :

s'yraM rehtorb si a dega-elddim nosrep

With the regex S/R, provided in my previous post, we would have obtained, instead :

yraM's rehtorb si a elddim-dega nosrep

Cheers,

guy038

P.S. :

You’ll need to search for words containing 25 or more letters, by yourself, with the regex [\w'’-]{25,} and, then, change these words manually. Not a big task, anyway !

Terry R

@Robin-Cruise said in Regex: How to turn more words in the line as if they were seen in the mirror?:

Mirrors Actually Flip the Letters in a Word? Or, how to turn more words in the line as if they were seen in the mirror?
For example:
At the Olympics you run not only for yourself, but for all people.
After using a regex formula, the output should be:
Ta eht scipmylO uoy nur ton ylno rof flesruoy, tub rof lla elpoep.

Using the examples provided (by OP and @guy038) the following regex would do the job. It is much shorter than the second regex provided by @guy038, although I should add that my regex will (currently) work with a word of between 2 to 39 characters long.

The regex could be extended much further as not anywhere approaching the 2048 character limit. A calculation suggests a word of 79 characters could be accomodated as ([\w'’-])?(?(19)([\w'’-])) is 26 characters long. Divided into 2048 we have 78 with 20 to spare. 20 together with the #1-#9 groups saving another 9 characters allows for 1 additional word character. I am not going to write a regex that long even if I can cheat by using a number of Notepad++ functions to create that regex.

So using the Replace With function we have:
Find What:\b([\w'’-])([\w'’-])?([\w'’-])?([\w'’-])?([\w'’-])?([\w'’-])?([\w'’-])?([\w'’-])?([\w'’-])?([\w'’-])?([\w'’-])?([\w'’-])?([\w'’-])?([\w'’-])?([\w'’-])?([\w'’-])?([\w'’-])?([\w'’-])?([\w'’-])?([\w'’-])?(?(19)([\w'’-]))(?(18)([\w'’-]))(?(17)([\w'’-]))(?(16)([\w'’-]))(?(15)([\w'’-]))(?(14)([\w'’-]))(?(13)([\w'’-]))(?(12)([\w'’-]))(?(11)([\w'’-]))(?(10)([\w'’-]))(?(9)([\w'’-]))(?(8)([\w'’-]))(?(7)([\w'’-]))(?(6)([\w'’-]))(?(5)([\w'’-]))(?(4)([\w'’-]))(?(3)([\w'’-]))(?(2)([\w'’-]))([\w'’-])\b
Replace With:${39}${38}${37}${36}${35}${34}${33}${32}${31}${30}${29}${28}${27}${26}${25}${24}${23}${22}${21}${20}${19}${18}${17}${16}${15}${14}${13}${12}${11}${10}${9}${8}${7}${6}${5}${4}${3}${2}${1}

What follows is how I reached the solution and need not be read unless you want an in-depth foundation on how it works.

What is occuring is that it has to capture a minimum of 2 word characters (plus additional characters) [\w'’-], and that is the first and last capture groups in the regex. There are 39 capture groups with pairs of #2 & #38, #3 & #37, #4 & #36 etc linked together. That is to say if capture group #2 does capture a character then capture group #38 MUST also capture a character. Since these work as pairs and I already have the first and last capture groups which also MUST capture a character each I needed to include another capture group which operates alone, allowing for an odd numbered length word, this is the one immediately before (?(19)([\w'’-])).

Of course the replacement field just needs to write these capture groups back in reverse order. If a capture group never stored a character it doesn’t write anything back.

Terry

Robin Cruise

@Terry-R your Regex works fine, except that no more than 74 words ( 462 characters) can be change.But, important is that is working fine

@guy038 I have a little problem. I copy and find with your regex. The search works. Just like in your first regex. Good.

But the replacement doesn’t work. Says that all occurrences were replaced in entire file, but actually nothing is change.

Terry R

@Robin-Cruise said in Regex: How to turn more words in the line as if they were seen in the mirror?:

your Regex works fine, except that no more than 74 words ( 462 characters) can be change.

I don’t understand what you mean by 74 words. Currently any word of 39 characters or less will be “mirrored”. So if you have a document of many words, all shorter than 40 characters they will ALL reverse. If you have a problem, then possibly the next word (75th) is longer or has some other issue.

Maybe you need to provide an example around the area my regex fails, or even better a link to download the file you are testing with.

Terry

Robin Cruise

@Terry-R your regex is very good, except it cannot replace more than 74 words (aprox.)

For example, try yourself to replace this text, you will see that you have to shorten it:

I hear the echo of my mind on the mist of the summer wind. And, most importantly, when you listen to something soothing, it is because your soul is there, in the sun, by the sea, by the sand, barely looking for that special something that will not consume you, a sparkle, a beautiful word, a contoured idea of a thought that crosses your mind and that you can only share with yourself. It all starts with a thought that appears in the field of your consciousness. The heat of the summer wind is still there.

Terry R

@Robin-Cruise said in Regex: How to turn more words in the line as if they were seen in the mirror?:

For example, try yourself to replace this text, you will see that you have to shorten it:

Nope, did not need to shorten it, changed all the 2 (or more) letter words. Here is the result:

So that was 90 changes made.

I recall from previous posts finding solutions for you forum members have often had problems getting you to understand the need for accuracy in copying the expressions we supply. And also issues with getting you to actually supply examples where issues occur. I can only reiterate those sentiments, you do need to be accurate and provide good examples when you find issues. The above example you provided was not a problem when I tested it.

Terry

guy038

Hi, @robin-cruise, @alan-kilborn, @hellena-crainicu, @terry-r and all,

Sorry, Terry, that I have not commented on your solution yet, but I was busy sorting out a lot of photos, and reducing their size by lowering their JPEG quality factor !

And, indeed, your solution, with conditional expressions, is really impressive ! Where I use brute force, you use a very subtle method ;-))

I did some additional tests with the license.txt file. I tested if the two \b assertions are necessary in my and your solutions ! And it happens that the only difference is about words ending with an apostrophe, as in this beginning of sentence : For both users' and authors' sake, ...

With the \b assertions the text becomes roF htob sresu' dna srohtua' ekas, ...
Without the \b assertions the text becomes roF htob 'sresu dna 'srohtua ekas, ...

Omitting the \b assertions seems more logical as we consider the apostrophe as a word char itself. So the ending ' must becomes a starting ', after replacement !

So, your search regex can be expressed, in free-spacing mode, as :

(?x)
([\w'’-])          # Group 01  FIRST char of a 'word'
([\w'’-])?         # Group 02
([\w'’-])?         # Group 03
([\w'’-])?         # Group 04
([\w'’-])?         # Group 05
([\w'’-])?         # Group 06
([\w'’-])?         # Group 07
([\w'’-])?         # Group 08
([\w'’-])?         # Group 09
([\w'’-])?         # Group 10
([\w'’-])?         # Group 11
([\w'’-])?         # Group 12
([\w'’-])?         # Group 13
([\w'’-])?         # Group 14
([\w'’-])?         # Group 15
([\w'’-])?         # Group 16
([\w'’-])?         # Group 17
([\w'’-])?         # Group 18
([\w'’-])?         # Group 19
([\w'’-])?         # Group 20  CENTRAL char when 'word' with an ODD number of letters
(?(19) ([\w'’-]))  # Group 21
(?(18) ([\w'’-]))  # Group 22
(?(17) ([\w'’-]))  # Group 23
(?(16) ([\w'’-]))  # Group 24
(?(15) ([\w'’-]))  # Group 25
(?(14) ([\w'’-]))  # Group 26
(?(13) ([\w'’-]))  # Group 27
(?(12) ([\w'’-]))  # Group 28
(?(11) ([\w'’-]))  # Group 29
(?(10) ([\w'’-]))  # Group 30
(?(9)  ([\w'’-]))  # Group 31
(?(8)  ([\w'’-]))  # Group 32
(?(7)  ([\w'’-]))  # Group 33
(?(6)  ([\w'’-]))  # Group 34
(?(5)  ([\w'’-]))  # Group 35
(?(4)  ([\w'’-]))  # Group 36
(?(3)  ([\w'’-]))  # Group 37
(?(2)  ([\w'’-]))  # Group 38
       ([\w'’-])   # Group 39  LAST char of a 'word'

And the replacement part is the single line :

$39$38$37$36$35$34$33$32$31$30$29$28$27$26$25$24$23$22$21$20$19$18$17$16$15$14$13$12$11$10$9$8$7$6$5$4$3$2$1

And, we have the general template :

- For words containing 1 WORD char   =>  NOT processed
- For words containing 2 WORD chars  =>  The groups 1                         and 39 are DEFINED
- For words containing 3 WORD chars  =>  The groups 1,          20,           and 39 are DEFINED
- For words containing 4 WORD chars  =>  The groups 1, 2,                  38 and 39 are DEFINED
- For words containing 5 WORD chars  =>  The groups 1, 2,       20,        38 and 39 are DEFINED
- For words containing 6 WORD chars  =>  The groups 1, 2, 3,           37, 38 and 39 are DEFINED
- For words containing 7 WORD chars  =>  The groups 1, 2, 3,    20,    37, 38 and 39 are DEFINED
...
...

Again, a very nice and clever solution !

Best Regards,

guy038

Terry R

@guy038 said in Regex: How to turn more words in the line as if they were seen in the mirror?:

Omitting the \b assertions seems more logical as we consider the apostrophe as a word char itself. So the ending ’ must becomes a starting ', after replacement !

Thanks @guy038 for your background research. Whilst removing the \b works, it seemed to me to be illogical to do so, yet the facts seem to prove it. I would not have considered that option.

What I have been doing though is to do some additional research on my solution, see if it’s possible to include other features. One which I was successful with, is to also work on single letter words. You may ask why, however often a single letter word (I, A) are at the start of a sentence, or at least capitalised. I figured that the result might be to not only reverse the letters but to make ALL letters lowercase.

So the latest version including your removal of \b and incorporating single letter words is (this regex works on a maximum of 25 letter words, more on that later).

Find What:([\w'’-])([\w'’-])?([\w'’-])?([\w'’-])?([\w'’-])?([\w'’-])?([\w'’-])?([\w'’-])?([\w'’-])?([\w'’-])?([\w'’-])?([\w'’-])?([\w'’-])?(?(12)([\w'’-]))(?(11)([\w'’-]))(?(10)([\w'’-]))(?(9)([\w'’-]))(?(8)([\w'’-]))(?(7)([\w'’-]))(?(6)([\w'’-]))(?(5)([\w'’-]))(?(4)([\w'’-]))(?(3)([\w'’-]))(?(2)([\w'’-]))([\w'’-])|(\w)
Replace With:\L${26}${25}${24}${23}${22}${21}${20}${19}${18}${17}${16}${15}${14}${13}${12}${11}${10}${9}${8}${7}${6}${5}${4}${3}${2}${1}

The reason for working to a maximum of 25 letters (similar to your 24 letter word max) was to look at the number of steps taken to process and indirectly time to process. I used the OP’s initial example and used the regex101.com website to show stats.

My regex took 20892 steps and a time of 6.1ms. Your regex took (for the same example) 5506 steps and a time of 0.9ms. So my regex takes almost 4 times as many steps to process a file as yours, but more importantly it takes over 6 times longer to process that file.

I’d mentioned some time ago about making regex efficient and whilst for normal use it doesn’t cause any issues not being totally efficient I wondered if in this case the process time might be significant enough for users who are otherwise unfamiliar with potential lag in processing time.

I ran my (revised) regex on the LICENSE.TXT file (for Notepad++) and it completed in about 1 second for just over 5800 words. So the conclusion seems to be that whilst my regex will take 6 times longer than yours to process, the “wait to complete” time will be minimal.

Cheers
Terry

Hellena Crainicu

@Terry-R very good solution.

guy038

Hello, @robin-cruise, @alan-kilborn, @hellena-crainicu, @terry-r and all,

@terry-R, yes, I suppose that a limit of 24/25 characters, for length of English words, seems sensible, anyway !

In this article :

https://en.wikipedia.org/wiki/Longest_word_in_English#Major_dictionaries

it is said :

Ross Eckler has noted that most of the longest English words are not likely to occur in general text, meaning non-technical present-day text seen by casual readers, in which the author did not specifically intend to use an unusually long word. According to Eckler, the longest words likely to be encountered, in general text, are deinstitutionalization and counterrevolutionaries, with 22 letters each.

Note also this curiosity, in your country :

https://en.wikipedia.org/wiki/Longest_word_in_English#Place_names

BR

guy038

Hellena Crainicu

I thought of a much simpler solution, maybe you can help me a little bit.

SEARCH: ([A-Za-z0-9\-\'])+

REPLACE BY: \1

The only inconvenience, is that after S/R, I get only the last letter from each word. I don’t know where the other letters were lost…

Maybe some of you will update a little bit my regex, as to find a simple solution.

of cours, I can obtain many letters, if I double the regex as:

SEARCH: ([A-Za-z0-9\-\'])+([A-Za-z0-9\-\'])+([A-Za-z0-9\-\'])+([A-Za-z0-9\-\'])+

REPLACE BY: \4\3\2\1

Terry R

@Hellena-Crainicu said in Regex: How to turn more words in the line as if they were seen in the mirror?:

I thought of a much simpler solution, maybe you can help me a little bit.

Your idea is actually just a rehash of the ones provided by @guy038 and myself. You have a lot to learn yet about regex, but don’t worry, this is a good way to learn. By trying something, realising it didn’t work, you at least have opportunity to dissect it and understand yet more about regex.

I use the website rexegg.com as a good source of information. Just be aware that it refers to the many different flavours of regular expressions, not all examples will work in Notepad++. There are also other good sources of regex information available through the FAQ section of the forum.

In your regex the [A-Za-z0-9\-\'] is almost identical to the \w that we used. You also ask where the other letters went to, in finding only the last letter was returned by using \1 when your find expression had the + at the end. The reason is that at the point the capture group saved any characters the + was outside of the (). So the expression captured a single character, then as the + was processed, it AGAIN captured a character, but in doing so the capture group \1 was overwritten with the latest character. Had you put the + inside of the () you will get all characters returned, but they will be in the original order.

At the moment I think @guy038 and my solutions are as good as it gets within a regex world. Whilst we have shown it is possible with a regular expression to “mirror” (reverse) a word it does take quite a bit of coding to do so. Python (and other) language solutions work far more efficiently as they will often have higher level functions to take care of most of the hard work. In regex everything must be completed simply.

Terry