Regex: crossword maker / mix words



  • hello. I wondering if there is any possibility to mix words, to make some kind a crossword with the content of a text file with regular expressions:

    suppose I have words A, B, C, D…etc
    I need to mix them in a different order: D, A, C, B…etc



  • Hello Vasile Caraus,

    Quite easy !

    Let’s suppose that you consider, for instance, a list of seven consecutive words. For catching them all into groups , in order, to change their natural order, we’ll use the following simple regex S/R :

    SEARCH : (\w+)\h+(\w+)\h+(\w+)\h+(\w+)\h+(\w+)\h+(\w+)\h+(\w+)

    REPLACE : \3 \1 \7 \5 \4 \2 \6

    Notes :

    • The syntax \h+ represents any non null amount of horizontal blank characters : Space ( \x20 ), Tabulation ( \x09 ) or No-Break Space ( \xA0 )

    • Each form \n, is separated from his neighbour, by a space

    • In the replacement part, if you have more than nine groups, you must replace the \n syntax by the $n syntax, which allows groups of order > 9

    So, from the example text, of seven words, below :

    A quick test about permutations of words
    

    We, now, obtain the text :

    test A words permutations about quick of
    

    Remark :

    This kind of permutations comes to light a cycle, giving again, at the end of the cycle, the initial configuration of words !

    For instance, let’s suppose the example text above in a new tab and the above S/R. Then, each time you click on the Replace All button ( or hit the ALT + A shortcut ), your get a new sentence, giving again, after ten clicks, the original sentence : A quick test about permutations of words :-))

    Here is, below the length of the cycle, for some replacement configurations :

    • REPLACE = \3 \1 \7 \5 \4 \2 \6 => cycle = 10

    • REPLACE = \7 \4 \3 \6 \2 \1 \5 => cycle = 6

    • REPLACE = \4 \6 \1 \7 \5 \3 \2 => cycle = 6

    • REPLACE = \3 \5 \2 \7 \4 \1 \6 => cycle = 7

    • REPLACE = \2 \1 \5 \4 \3 \7 \6 => cycle = 2

    • REPLACE = \2 \1 \4 \5 \6 \7 \3 => cycle = 8

    Note the two specific configurations, of cycle = 7, below :

    • \2 \3 \4 \5 \6 \7 \1

    • \7 \1 \2 \3 \4 \5 \6

    => The text seems to move the text forward or backwards, coming back by the other side ! Sorry, but as I’m French, I don’t know what are the right English words to describe this behaviour. Thanks, by advance, for giving me the right expression !

    Best Regards,

    guy038



  • super answer. Thanks.

    But, if I want to consider connected little words separated by a hyphen, so that replacement take this words as a whole, and not separated?



  • \w+(?:-\w+)+

    this will select the words separated by hyphen, but I don’t know how to include this into your search code above.



  • Hi, Vasile Caraus,

    Sorry for my late reply but I’m just back to work, after a three-weeks summer holidays :-((

    First of all, I would like to point out than you cannot use a regex with the + quantifier, four your specific case, although it, indeed, matches all the words of a sentence too !

    For instance, my previous regex (\w+)\h+(\w+)\h+(\w+)\h+(\w+)\h+(\w+)\h+(\w+)\h+(\w+) cannot be shorten in (\w+)(?:\h+(\w+))+. Why ? Well, just because there would be only two groups :

    • The first group : (\w+), which represents the first word of the 7-words sentence

    • The second group : (\w+), which represents the last word of the 7-words sentence ( last item of the list (?:\h+(\w+))+ )

    So, the solution consists to include all the separator characters, in addition to the previous syntax \h+


    Then, let’s suppose that your words may be separated either by a blank character, an hyphen or a colon. Given my previous example ( a 7-words sentence ), the original text could be :

    A-quick test-about	permutations:of-words
    

    with a space character between the word quick and test and a tabulation character between the words about and permutations

    Now, how do you like that the permuted words are linked together ?

    • With a space
    • With an hyphen
    • With a colon

    OR, do you prefer that the replacement keeps the six different separators, at their exact location ?


    • In the first case ( sentence with an unique separator between words ) use the following S/R :

    SEARCH : (\w+)[\h+:-](\w+)[\h+:-](\w+)[\h+:-](\w+)[\h+:-](\w+)[\h+:-](\w+)[\h+:-](\w+)

    and :

    REPLACE : \3 \1 \7 \5 \4 \2 \6

    or

    REPLACE : \3-\1-\7-\5-\4-\2-\6

    or

    REPLACE : \3:\1:\7:\5:\4:\2:\6


    In the second case, we need to remember the different separators, using additional groups, giving the following S/R :

    SEARCH : (\w+)([\h+:-])(\w+)([\h+:-])(\w+)([\h+:-])(\w+)([\h+:-])(\w+)([\h+:-])(\w+)([\h+:-])(\w+)

    REPLACE : $5$2$1$4$13$6$9$8$7$10$3$12$11

    Notes :

    • We are using, in replacement, the group form $n, as it accepts values of n > 9

    • You certainly noticed that the number of the groups, referring to words, are always an odd numbers and those, referring to separators, are always even numbers !

    • Therefore, if you want other permutations of this 7-words sequence, just change, as you like, the location of the forms $n, with an odd number, in the replacement regex, WITHOUT changing the forms $n with an **even number !

    So, from the above text, we would obtain, for the chosen permutation, with the last S/R, the consecutive sentences, in a 10-cycle, below :

    test-A words-permutations	about:quick-of
    
    words-test of-about	permutations:A-quick
    
    of-words quick-permutations	about:test-A
    
    quick-of A-about	permutations:words-test
    
    A-quick test-permutations	about:of-words
    
    test-A words-about	permutations:quick-of
    
    words-test of-permutations	about:A-quick
    
    of-words quick-about	permutations:test-A
    
    quick-of A-permutations	about:words-test
    
    A-quick test-about	permutations:of-words
    

    Cheers,

    guy038



  • nice, but still is a little problem for my language. If I have little words like "le-a" the script will change the order in "a-le" but this is not a word anymore.



  • and another little thing. If I have a phrase like "Și pentru valurile lor mândre cine poate seta limite?" the regular expressions you made will change the words before the last two words "seta limite", these do not change.



  • Hi Vasile Caraus,

    Oh ! Sorry, I was completely wrong about your problem :-(( I should have read your posts more carefully !

    Indeed, you said, for instance :

    if I want to consider connected little words separated by a hyphen

    and also your regex \w+(?:-\w+)+

    So, on the contrary, I see, now, that you just wanted to add the hythen-minus sign to the default word characters list, which contains the range [0-9_A-Za-z] + all the accentuated letters, in capital and lowercase !

    OK, I just figure out which kind of regex we need !


    Of course, as my previous regex concerns only 7 groups, it’s quite logical that my regex forgot the last two words of your 9-words sentence "Și pentru valurile lor mândre cine poate seta limite?" !!

    So, there, still, are two problems :

    • Firstly, do you like a specific permutation or must the needed permutation follow some specific rules ?

    I mean : let’s consider four sentences, containing 3, 7, 9, 11 words. As each word needs to belong to a group, previously created, and starting from my previous example of a 7-words sentence, with the particular permutation \3 \1 \7 \5 \4 \2 \6 :

      • For a 3-words sentence, do you prefer the permutation \2 \3 \1 or \3 \1 \2 or \2 \3 \1 and so on …

      • For a 9-words sentence, do you prefer the permutation \5 \3 \6 \2 \7 \8 \1 \9 \4 or \7 \3 \9 \2 \5 \8 \6 \4 \1 or else…

      • For a 11-words sentence, do you prefer the permutation $5 $3 $6 $10 $2 $7 $8 $1 $9 $11 $4 or $5 $4 $3 $2 $7 $8 $10 $6 $1 $9 $11 or … ?

    Do you see the problem ?

    • Secondly, have you got an idea of the maximum number of words of the sentences of your text ? We need that number to write all the groups, in the regex !

    Therefore, in the meanwhile, I just give you, again, the updated regex S/R, for an EXACT 7-words sentence :

    SEARCH : ([\w-]+)[^\w\r\n-]+([\w-]+)[^\w\r\n-]+([\w-]+)[^\w\r\n-]+([\w-]+)[^\w\r\n-]+([\w-]+)[^\w\r\n-]+([\w-]+)[^\w\r\n-]+([\w-]+)

    REPLACE : \3 \1 \7 \5 \4 \2 \6

    So, from the original text, below, with an hyphen inside the word permutations :

    A quick test about permu-tations of words
    

    And the permutation 3 1 7 5 4 2 6, we obtain the right text, with the word permu-tations inchanged, below :

    test A words permu-tations about quick of
    

    Notes :

    • This long search regex is, simply, the concatenation of the regexes below :

    ([\w-]+)
    [^\w\r\n-]+([\w-]+)
    [^\w\r\n-]+([\w-]+)
    [^\w\r\n-]+([\w-]+)
    [^\w\r\n-]+([\w-]+)
    [^\w\r\n-]+([\w-]+)
    [^\w\r\n-]+([\w-]+)

    • The ([\w-]+) syntax represents a single word, made up default word characters and/or hyphens, surrounded by round brackets, to form a group

    • The separator [^\w\r\n-]+ represents any range of characters, between the words, different from [\w-] and different of EOL line characters

    • In the replacement part, I suppose that the separator is the usual space character

    See you later,

    guy038



  • works perfectly ! what can I say? you are grandiose ! THANK YOU !!


Log in to reply