Regex: crossword maker / mix words
-
hello. I wondering if there is any possibility to mix words, to make some kind a crossword with the content of a text file with regular expressions:
suppose I have words A, B, C, D…etc
I need to mix them in a different order: D, A, C, B…etc -
Hello Vasile Caraus,
Quite easy !
Let’s suppose that you consider, for instance, a list of seven consecutive words. For catching them all into groups , in order, to change their natural order, we’ll use the following simple regex S/R :
SEARCH :
(\w+)\h+(\w+)\h+(\w+)\h+(\w+)\h+(\w+)\h+(\w+)\h+(\w+)REPLACE :
\3 \1 \7 \5 \4 \2 \6Notes :
-
The syntax
\h+represents any non null amount of horizontal blank characters : Space (\x20), Tabulation (\x09) or No-Break Space (\xA0) -
Each form
\n, is separated from his neighbour, by a space -
In the replacement part, if you have more than nine groups, you must replace the
\nsyntax by the$nsyntax, which allows groups of order> 9
So, from the example text, of seven words, below :
A quick test about permutations of wordsWe, now, obtain the text :
test A words permutations about quick of
Remark :
This kind of permutations comes to light a cycle, giving again, at the end of the cycle, the initial configuration of words !
For instance, let’s suppose the example text above in a new tab and the above S/R. Then, each time you click on the Replace All button ( or hit the ALT + A shortcut ), your get a new sentence, giving again, after ten clicks, the original sentence : A quick test about permutations of words :-))
Here is, below the length of the cycle, for some replacement configurations :
-
REPLACE =
\3 \1 \7 \5 \4 \2 \6=> cycle = 10 -
REPLACE =
\7 \4 \3 \6 \2 \1 \5=> cycle = 6 -
REPLACE =
\4 \6 \1 \7 \5 \3 \2=> cycle = 6 -
REPLACE =
\3 \5 \2 \7 \4 \1 \6=> cycle = 7 -
REPLACE =
\2 \1 \5 \4 \3 \7 \6=> cycle = 2 -
REPLACE =
\2 \1 \4 \5 \6 \7 \3=> cycle = 8
Note the two specific configurations, of cycle = 7, below :
-
\2 \3 \4 \5 \6 \7 \1 -
\7 \1 \2 \3 \4 \5 \6
=> The text seems to move the text forward or backwards, coming back by the other side ! Sorry, but as I’m French, I don’t know what are the right English words to describe this behaviour. Thanks, by advance, for giving me the right expression !
Best Regards,
guy038
-
-
super answer. Thanks.
But, if I want to consider connected little words separated by a hyphen, so that replacement take this words as a whole, and not separated?
-
\w+(?:-\w+)+this will select the words separated by hyphen, but I don’t know how to include this into your search code above.
-
Hi, Vasile Caraus,
Sorry for my late reply but I’m just back to work, after a three-weeks summer holidays :-((
First of all, I would like to point out than you cannot use a regex with the
+quantifier, four your specific case, although it, indeed, matches all the words of a sentence too !For instance, my previous regex
(\w+)\h+(\w+)\h+(\w+)\h+(\w+)\h+(\w+)\h+(\w+)\h+(\w+)cannot be shorten in(\w+)(?:\h+(\w+))+. Why ? Well, just because there would be only two groups :-
The first group :
(\w+), which represents the first word of the 7-words sentence -
The second group :
(\w+), which represents the last word of the 7-words sentence ( last item of the list(?:\h+(\w+))+)
So, the solution consists to include all the separator characters, in addition to the previous syntax
\h+
Then, let’s suppose that your words may be separated either by a blank character, an hyphen or a colon. Given my previous example ( a 7-words sentence ), the original text could be :
A-quick test-about permutations:of-wordswith a space character between the word quick and test and a tabulation character between the words about and permutations
Now, how do you like that the permuted words are linked together ?
- With a space
- With an hyphen
- With a colon
OR, do you prefer that the replacement keeps the six different separators, at their exact location ?
- In the first case ( sentence with an unique separator between words ) use the following S/R :
SEARCH :
(\w+)[\h+:-](\w+)[\h+:-](\w+)[\h+:-](\w+)[\h+:-](\w+)[\h+:-](\w+)[\h+:-](\w+)and :
REPLACE :
\3 \1 \7 \5 \4 \2 \6or
REPLACE :
\3-\1-\7-\5-\4-\2-\6or
REPLACE :
\3:\1:\7:\5:\4:\2:\6
In the second case, we need to remember the different separators, using additional groups, giving the following S/R :
SEARCH :
(\w+)([\h+:-])(\w+)([\h+:-])(\w+)([\h+:-])(\w+)([\h+:-])(\w+)([\h+:-])(\w+)([\h+:-])(\w+)REPLACE :
$5$2$1$4$13$6$9$8$7$10$3$12$11Notes :
-
We are using, in replacement, the group form
$n, as it accepts values of n> 9 -
You certainly noticed that the number of the groups, referring to words, are always an odd numbers and those, referring to separators, are always even numbers !
-
Therefore, if you want other permutations of this 7-words sequence, just change, as you like, the location of the forms
$n, with an odd number, in the replacement regex, WITHOUT changing the forms$nwith an **even number !
So, from the above text, we would obtain, for the chosen permutation, with the last S/R, the consecutive sentences, in a 10-cycle, below :
test-A words-permutations about:quick-of words-test of-about permutations:A-quick of-words quick-permutations about:test-A quick-of A-about permutations:words-test A-quick test-permutations about:of-words test-A words-about permutations:quick-of words-test of-permutations about:A-quick of-words quick-about permutations:test-A quick-of A-permutations about:words-test A-quick test-about permutations:of-wordsCheers,
guy038
-
-
nice, but still is a little problem for my language. If I have little words like
"le-a"the script will change the order in"a-le"but this is not a word anymore. -
and another little thing. If I have a phrase like
"Și pentru valurile lor mândre cine poate seta limite?"the regular expressions you made will change the words before the last two words"seta limite", these do not change. -
Hi Vasile Caraus,
Oh ! Sorry, I was completely wrong about your problem :-(( I should have read your posts more carefully !
Indeed, you said, for instance :
if I want to consider connected little words separated by a hyphen
and also your regex
\w+(?:-\w+)+So, on the contrary, I see, now, that you just wanted to add the hythen-minus sign to the default word characters list, which contains the range
[0-9_A-Za-z]+ all the accentuated letters, in capital and lowercase !OK, I just figure out which kind of regex we need !
Of course, as my previous regex concerns only 7 groups, it’s quite logical that my regex forgot the last two words of your 9-words sentence
"Și pentru valurile lor mândre cine poate seta limite?"!!So, there, still, are two problems :
- Firstly, do you like a specific permutation or must the needed permutation follow some specific rules ?
I mean : let’s consider four sentences, containing 3, 7, 9, 11 words. As each word needs to belong to a group, previously created, and starting from my previous example of a 7-words sentence, with the particular permutation
\3 \1 \7 \5 \4 \2 \6:-
-
For a 3-words sentence, do you prefer the permutation
\2 \3 \1or\3 \1 \2or\2 \3 \1and so on … -
For a 9-words sentence, do you prefer the permutation
\5 \3 \6 \2 \7 \8 \1 \9 \4or\7 \3 \9 \2 \5 \8 \6 \4 \1or else… -
For a 11-words sentence, do you prefer the permutation
$5 $3 $6 $10 $2 $7 $8 $1 $9 $11 $4or$5 $4 $3 $2 $7 $8 $10 $6 $1 $9 $11or … ?
-
Do you see the problem ?
- Secondly, have you got an idea of the maximum number of words of the sentences of your text ? We need that number to write all the groups, in the regex !
Therefore, in the meanwhile, I just give you, again, the updated regex S/R, for an EXACT 7-words sentence :
SEARCH :
([\w-]+)[^\w\r\n-]+([\w-]+)[^\w\r\n-]+([\w-]+)[^\w\r\n-]+([\w-]+)[^\w\r\n-]+([\w-]+)[^\w\r\n-]+([\w-]+)[^\w\r\n-]+([\w-]+)REPLACE :
\3 \1 \7 \5 \4 \2 \6So, from the original text, below, with an hyphen inside the word permutations :
A quick test about permu-tations of wordsAnd the permutation
3 1 7 5 4 2 6, we obtain the right text, with the word permu-tations inchanged, below :test A words permu-tations about quick of
Notes :
- This long search regex is, simply, the concatenation of the regexes below :
([\w-]+)
[^\w\r\n-]+([\w-]+)
[^\w\r\n-]+([\w-]+)
[^\w\r\n-]+([\w-]+)
[^\w\r\n-]+([\w-]+)
[^\w\r\n-]+([\w-]+)
[^\w\r\n-]+([\w-]+)-
The
([\w-]+)syntax represents a single word, made up default word characters and/or hyphens, surrounded by round brackets, to form a group -
The separator
[^\w\r\n-]+represents any range of characters, between the words, different from[\w-]and different of EOL line characters -
In the replacement part, I suppose that the separator is the usual space character
See you later,
guy038
-
works perfectly ! what can I say? you are grandiose ! THANK YOU !!