Regex: crossword maker / mix words
-
hello. I wondering if there is any possibility to mix words, to make some kind a crossword with the content of a text file with regular expressions:
suppose I have words A, B, C, D…etc
I need to mix them in a different order: D, A, C, B…etc -
Hello Vasile Caraus,
Quite easy !
Let’s suppose that you consider, for instance, a list of seven consecutive words. For catching them all into groups , in order, to change their natural order, we’ll use the following simple regex S/R :
SEARCH :
(\w+)\h+(\w+)\h+(\w+)\h+(\w+)\h+(\w+)\h+(\w+)\h+(\w+)
REPLACE :
\3 \1 \7 \5 \4 \2 \6
Notes :
-
The syntax
\h+
represents any non null amount of horizontal blank characters : Space (\x20
), Tabulation (\x09
) or No-Break Space (\xA0
) -
Each form
\n
, is separated from his neighbour, by a space -
In the replacement part, if you have more than nine groups, you must replace the
\n
syntax by the$n
syntax, which allows groups of order> 9
So, from the example text, of seven words, below :
A quick test about permutations of words
We, now, obtain the text :
test A words permutations about quick of
Remark :
This kind of permutations comes to light a cycle, giving again, at the end of the cycle, the initial configuration of words !
For instance, let’s suppose the example text above in a new tab and the above S/R. Then, each time you click on the Replace All button ( or hit the ALT + A shortcut ), your get a new sentence, giving again, after ten clicks, the original sentence : A quick test about permutations of words :-))
Here is, below the length of the cycle, for some replacement configurations :
-
REPLACE =
\3 \1 \7 \5 \4 \2 \6
=> cycle = 10 -
REPLACE =
\7 \4 \3 \6 \2 \1 \5
=> cycle = 6 -
REPLACE =
\4 \6 \1 \7 \5 \3 \2
=> cycle = 6 -
REPLACE =
\3 \5 \2 \7 \4 \1 \6
=> cycle = 7 -
REPLACE =
\2 \1 \5 \4 \3 \7 \6
=> cycle = 2 -
REPLACE =
\2 \1 \4 \5 \6 \7 \3
=> cycle = 8
Note the two specific configurations, of cycle = 7, below :
-
\2 \3 \4 \5 \6 \7 \1
-
\7 \1 \2 \3 \4 \5 \6
=> The text seems to move the text forward or backwards, coming back by the other side ! Sorry, but as I’m French, I don’t know what are the right English words to describe this behaviour. Thanks, by advance, for giving me the right expression !
Best Regards,
guy038
-
-
super answer. Thanks.
But, if I want to consider connected little words separated by a hyphen, so that replacement take this words as a whole, and not separated?
-
\w+(?:-\w+)+
this will select the words separated by hyphen, but I don’t know how to include this into your search code above.
-
Hi, Vasile Caraus,
Sorry for my late reply but I’m just back to work, after a three-weeks summer holidays :-((
First of all, I would like to point out than you cannot use a regex with the
+
quantifier, four your specific case, although it, indeed, matches all the words of a sentence too !For instance, my previous regex
(\w+)\h+(\w+)\h+(\w+)\h+(\w+)\h+(\w+)\h+(\w+)\h+(\w+)
cannot be shorten in(\w+)(?:\h+(\w+))+
. Why ? Well, just because there would be only two groups :-
The first group :
(\w+)
, which represents the first word of the 7-words sentence -
The second group :
(\w+)
, which represents the last word of the 7-words sentence ( last item of the list(?:\h+(\w+))+
)
So, the solution consists to include all the separator characters, in addition to the previous syntax
\h+
Then, let’s suppose that your words may be separated either by a blank character, an hyphen or a colon. Given my previous example ( a 7-words sentence ), the original text could be :
A-quick test-about permutations:of-words
with a space character between the word quick and test and a tabulation character between the words about and permutations
Now, how do you like that the permuted words are linked together ?
- With a space
- With an hyphen
- With a colon
OR, do you prefer that the replacement keeps the six different separators, at their exact location ?
- In the first case ( sentence with an unique separator between words ) use the following S/R :
SEARCH :
(\w+)[\h+:-](\w+)[\h+:-](\w+)[\h+:-](\w+)[\h+:-](\w+)[\h+:-](\w+)[\h+:-](\w+)
and :
REPLACE :
\3 \1 \7 \5 \4 \2 \6
or
REPLACE :
\3-\1-\7-\5-\4-\2-\6
or
REPLACE :
\3:\1:\7:\5:\4:\2:\6
In the second case, we need to remember the different separators, using additional groups, giving the following S/R :
SEARCH :
(\w+)([\h+:-])(\w+)([\h+:-])(\w+)([\h+:-])(\w+)([\h+:-])(\w+)([\h+:-])(\w+)([\h+:-])(\w+)
REPLACE :
$5$2$1$4$13$6$9$8$7$10$3$12$11
Notes :
-
We are using, in replacement, the group form
$n
, as it accepts values of n> 9
-
You certainly noticed that the number of the groups, referring to words, are always an odd numbers and those, referring to separators, are always even numbers !
-
Therefore, if you want other permutations of this 7-words sequence, just change, as you like, the location of the forms
$n
, with an odd number, in the replacement regex, WITHOUT changing the forms$n
with an **even number !
So, from the above text, we would obtain, for the chosen permutation, with the last S/R, the consecutive sentences, in a 10-cycle, below :
test-A words-permutations about:quick-of words-test of-about permutations:A-quick of-words quick-permutations about:test-A quick-of A-about permutations:words-test A-quick test-permutations about:of-words test-A words-about permutations:quick-of words-test of-permutations about:A-quick of-words quick-about permutations:test-A quick-of A-permutations about:words-test A-quick test-about permutations:of-words
Cheers,
guy038
-
-
nice, but still is a little problem for my language. If I have little words like
"le-a"
the script will change the order in"a-le"
but this is not a word anymore. -
and another little thing. If I have a phrase like
"Și pentru valurile lor mândre cine poate seta limite?"
the regular expressions you made will change the words before the last two words"seta limite"
, these do not change. -
Hi Vasile Caraus,
Oh ! Sorry, I was completely wrong about your problem :-(( I should have read your posts more carefully !
Indeed, you said, for instance :
if I want to consider connected little words separated by a hyphen
and also your regex
\w+(?:-\w+)+
So, on the contrary, I see, now, that you just wanted to add the hythen-minus sign to the default word characters list, which contains the range
[0-9_A-Za-z]
+ all the accentuated letters, in capital and lowercase !OK, I just figure out which kind of regex we need !
Of course, as my previous regex concerns only 7 groups, it’s quite logical that my regex forgot the last two words of your 9-words sentence
"Și pentru valurile lor mândre cine poate seta limite?"
!!So, there, still, are two problems :
- Firstly, do you like a specific permutation or must the needed permutation follow some specific rules ?
I mean : let’s consider four sentences, containing 3, 7, 9, 11 words. As each word needs to belong to a group, previously created, and starting from my previous example of a 7-words sentence, with the particular permutation
\3 \1 \7 \5 \4 \2 \6
:-
-
For a 3-words sentence, do you prefer the permutation
\2 \3 \1
or\3 \1 \2
or\2 \3 \1
and so on … -
For a 9-words sentence, do you prefer the permutation
\5 \3 \6 \2 \7 \8 \1 \9 \4
or\7 \3 \9 \2 \5 \8 \6 \4 \1
or else… -
For a 11-words sentence, do you prefer the permutation
$5 $3 $6 $10 $2 $7 $8 $1 $9 $11 $4
or$5 $4 $3 $2 $7 $8 $10 $6 $1 $9 $11
or … ?
-
Do you see the problem ?
- Secondly, have you got an idea of the maximum number of words of the sentences of your text ? We need that number to write all the groups, in the regex !
Therefore, in the meanwhile, I just give you, again, the updated regex S/R, for an EXACT 7-words sentence :
SEARCH :
([\w-]+)[^\w\r\n-]+([\w-]+)[^\w\r\n-]+([\w-]+)[^\w\r\n-]+([\w-]+)[^\w\r\n-]+([\w-]+)[^\w\r\n-]+([\w-]+)[^\w\r\n-]+([\w-]+)
REPLACE :
\3 \1 \7 \5 \4 \2 \6
So, from the original text, below, with an hyphen inside the word permutations :
A quick test about permu-tations of words
And the permutation
3 1 7 5 4 2 6
, we obtain the right text, with the word permu-tations inchanged, below :test A words permu-tations about quick of
Notes :
- This long search regex is, simply, the concatenation of the regexes below :
([\w-]+)
[^\w\r\n-]+([\w-]+)
[^\w\r\n-]+([\w-]+)
[^\w\r\n-]+([\w-]+)
[^\w\r\n-]+([\w-]+)
[^\w\r\n-]+([\w-]+)
[^\w\r\n-]+([\w-]+)
-
The
([\w-]+)
syntax represents a single word, made up default word characters and/or hyphens, surrounded by round brackets, to form a group -
The separator
[^\w\r\n-]+
represents any range of characters, between the words, different from[\w-]
and different of EOL line characters -
In the replacement part, I suppose that the separator is the usual space character
See you later,
guy038
-
works perfectly ! what can I say? you are grandiose ! THANK YOU !!