Regex: crossword maker / mix words
- 
hello. I wondering if there is any possibility to mix words, to make some kind a crossword with the content of a text file with regular expressions:
suppose I have words A, B, C, D…etc
I need to mix them in a different order: D, A, C, B…etc - 
Hello Vasile Caraus,
Quite easy !
Let’s suppose that you consider, for instance, a list of seven consecutive words. For catching them all into groups , in order, to change their natural order, we’ll use the following simple regex S/R :
SEARCH :
(\w+)\h+(\w+)\h+(\w+)\h+(\w+)\h+(\w+)\h+(\w+)\h+(\w+)REPLACE :
\3 \1 \7 \5 \4 \2 \6Notes :
- 
The syntax
\h+represents any non null amount of horizontal blank characters : Space (\x20), Tabulation (\x09) or No-Break Space (\xA0) - 
Each form
\n, is separated from his neighbour, by a space - 
In the replacement part, if you have more than nine groups, you must replace the
\nsyntax by the$nsyntax, which allows groups of order> 9 
So, from the example text, of seven words, below :
A quick test about permutations of wordsWe, now, obtain the text :
test A words permutations about quick of
Remark :
This kind of permutations comes to light a cycle, giving again, at the end of the cycle, the initial configuration of words !
For instance, let’s suppose the example text above in a new tab and the above S/R. Then, each time you click on the Replace All button ( or hit the ALT + A shortcut ), your get a new sentence, giving again, after ten clicks, the original sentence : A quick test about permutations of words :-))
Here is, below the length of the cycle, for some replacement configurations :
- 
REPLACE =
\3 \1 \7 \5 \4 \2 \6=> cycle = 10 - 
REPLACE =
\7 \4 \3 \6 \2 \1 \5=> cycle = 6 - 
REPLACE =
\4 \6 \1 \7 \5 \3 \2=> cycle = 6 - 
REPLACE =
\3 \5 \2 \7 \4 \1 \6=> cycle = 7 - 
REPLACE =
\2 \1 \5 \4 \3 \7 \6=> cycle = 2 - 
REPLACE =
\2 \1 \4 \5 \6 \7 \3=> cycle = 8 
Note the two specific configurations, of cycle = 7, below :
- 
\2 \3 \4 \5 \6 \7 \1 - 
\7 \1 \2 \3 \4 \5 \6 
=> The text seems to move the text forward or backwards, coming back by the other side ! Sorry, but as I’m French, I don’t know what are the right English words to describe this behaviour. Thanks, by advance, for giving me the right expression !
Best Regards,
guy038
 - 
 - 
super answer. Thanks.
But, if I want to consider connected little words separated by a hyphen, so that replacement take this words as a whole, and not separated?
 - 
\w+(?:-\w+)+this will select the words separated by hyphen, but I don’t know how to include this into your search code above.
 - 
Hi, Vasile Caraus,
Sorry for my late reply but I’m just back to work, after a three-weeks summer holidays :-((
First of all, I would like to point out than you cannot use a regex with the
+quantifier, four your specific case, although it, indeed, matches all the words of a sentence too !For instance, my previous regex
(\w+)\h+(\w+)\h+(\w+)\h+(\w+)\h+(\w+)\h+(\w+)\h+(\w+)cannot be shorten in(\w+)(?:\h+(\w+))+. Why ? Well, just because there would be only two groups :- 
The first group :
(\w+), which represents the first word of the 7-words sentence - 
The second group :
(\w+), which represents the last word of the 7-words sentence ( last item of the list(?:\h+(\w+))+) 
So, the solution consists to include all the separator characters, in addition to the previous syntax
\h+
Then, let’s suppose that your words may be separated either by a blank character, an hyphen or a colon. Given my previous example ( a 7-words sentence ), the original text could be :
A-quick test-about permutations:of-wordswith a space character between the word quick and test and a tabulation character between the words about and permutations
Now, how do you like that the permuted words are linked together ?
- With a space
 - With an hyphen
 - With a colon
 
OR, do you prefer that the replacement keeps the six different separators, at their exact location ?
- In the first case ( sentence with an unique separator between words ) use the following S/R :
 
SEARCH :
(\w+)[\h+:-](\w+)[\h+:-](\w+)[\h+:-](\w+)[\h+:-](\w+)[\h+:-](\w+)[\h+:-](\w+)and :
REPLACE :
\3 \1 \7 \5 \4 \2 \6or
REPLACE :
\3-\1-\7-\5-\4-\2-\6or
REPLACE :
\3:\1:\7:\5:\4:\2:\6
In the second case, we need to remember the different separators, using additional groups, giving the following S/R :
SEARCH :
(\w+)([\h+:-])(\w+)([\h+:-])(\w+)([\h+:-])(\w+)([\h+:-])(\w+)([\h+:-])(\w+)([\h+:-])(\w+)REPLACE :
$5$2$1$4$13$6$9$8$7$10$3$12$11Notes :
- 
We are using, in replacement, the group form
$n, as it accepts values of n> 9 - 
You certainly noticed that the number of the groups, referring to words, are always an odd numbers and those, referring to separators, are always even numbers !
 - 
Therefore, if you want other permutations of this 7-words sequence, just change, as you like, the location of the forms
$n, with an odd number, in the replacement regex, WITHOUT changing the forms$nwith an **even number ! 
So, from the above text, we would obtain, for the chosen permutation, with the last S/R, the consecutive sentences, in a 10-cycle, below :
test-A words-permutations about:quick-of words-test of-about permutations:A-quick of-words quick-permutations about:test-A quick-of A-about permutations:words-test A-quick test-permutations about:of-words test-A words-about permutations:quick-of words-test of-permutations about:A-quick of-words quick-about permutations:test-A quick-of A-permutations about:words-test A-quick test-about permutations:of-wordsCheers,
guy038
 - 
 - 
nice, but still is a little problem for my language. If I have little words like
"le-a"the script will change the order in"a-le"but this is not a word anymore. - 
and another little thing. If I have a phrase like
"Și pentru valurile lor mândre cine poate seta limite?"the regular expressions you made will change the words before the last two words"seta limite", these do not change. - 
Hi Vasile Caraus,
Oh ! Sorry, I was completely wrong about your problem :-(( I should have read your posts more carefully !
Indeed, you said, for instance :
if I want to consider connected little words separated by a hyphen
and also your regex
\w+(?:-\w+)+So, on the contrary, I see, now, that you just wanted to add the hythen-minus sign to the default word characters list, which contains the range
[0-9_A-Za-z]+ all the accentuated letters, in capital and lowercase !OK, I just figure out which kind of regex we need !
Of course, as my previous regex concerns only 7 groups, it’s quite logical that my regex forgot the last two words of your 9-words sentence
"Și pentru valurile lor mândre cine poate seta limite?"!!So, there, still, are two problems :
- Firstly, do you like a specific permutation or must the needed permutation follow some specific rules ?
 
I mean : let’s consider four sentences, containing 3, 7, 9, 11 words. As each word needs to belong to a group, previously created, and starting from my previous example of a 7-words sentence, with the particular permutation
\3 \1 \7 \5 \4 \2 \6:- 
- 
For a 3-words sentence, do you prefer the permutation
\2 \3 \1or\3 \1 \2or\2 \3 \1and so on … - 
For a 9-words sentence, do you prefer the permutation
\5 \3 \6 \2 \7 \8 \1 \9 \4or\7 \3 \9 \2 \5 \8 \6 \4 \1or else… - 
For a 11-words sentence, do you prefer the permutation
$5 $3 $6 $10 $2 $7 $8 $1 $9 $11 $4or$5 $4 $3 $2 $7 $8 $10 $6 $1 $9 $11or … ? 
 - 
 
Do you see the problem ?
- Secondly, have you got an idea of the maximum number of words of the sentences of your text ? We need that number to write all the groups, in the regex !
 
Therefore, in the meanwhile, I just give you, again, the updated regex S/R, for an EXACT 7-words sentence :
SEARCH :
([\w-]+)[^\w\r\n-]+([\w-]+)[^\w\r\n-]+([\w-]+)[^\w\r\n-]+([\w-]+)[^\w\r\n-]+([\w-]+)[^\w\r\n-]+([\w-]+)[^\w\r\n-]+([\w-]+)REPLACE :
\3 \1 \7 \5 \4 \2 \6So, from the original text, below, with an hyphen inside the word permutations :
A quick test about permu-tations of wordsAnd the permutation
3 1 7 5 4 2 6, we obtain the right text, with the word permu-tations inchanged, below :test A words permu-tations about quick of
Notes :
- This long search regex is, simply, the concatenation of the regexes below :
 
([\w-]+)
[^\w\r\n-]+([\w-]+)
[^\w\r\n-]+([\w-]+)
[^\w\r\n-]+([\w-]+)
[^\w\r\n-]+([\w-]+)
[^\w\r\n-]+([\w-]+)
[^\w\r\n-]+([\w-]+)- 
The
([\w-]+)syntax represents a single word, made up default word characters and/or hyphens, surrounded by round brackets, to form a group - 
The separator
[^\w\r\n-]+represents any range of characters, between the words, different from[\w-]and different of EOL line characters - 
In the replacement part, I suppose that the separator is the usual space character
 
See you later,
guy038
 - 
works perfectly ! what can I say? you are grandiose ! THANK YOU !!