Community
    • Login

    Regex: crossword maker / mix words

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    9 Posts 2 Posters 5.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Vasile CarausV
      Vasile Caraus
      last edited by Vasile Caraus

      hello. I wondering if there is any possibility to mix words, to make some kind a crossword with the content of a text file with regular expressions:

      suppose I have words A, B, C, D…etc
      I need to mix them in a different order: D, A, C, B…etc

      1 Reply Last reply Reply Quote 0
      • guy038G
        guy038
        last edited by

        Hello Vasile Caraus,

        Quite easy !

        Let’s suppose that you consider, for instance, a list of seven consecutive words. For catching them all into groups , in order, to change their natural order, we’ll use the following simple regex S/R :

        SEARCH : (\w+)\h+(\w+)\h+(\w+)\h+(\w+)\h+(\w+)\h+(\w+)\h+(\w+)

        REPLACE : \3 \1 \7 \5 \4 \2 \6

        Notes :

        • The syntax \h+ represents any non null amount of horizontal blank characters : Space ( \x20 ), Tabulation ( \x09 ) or No-Break Space ( \xA0 )

        • Each form \n, is separated from his neighbour, by a space

        • In the replacement part, if you have more than nine groups, you must replace the \n syntax by the $n syntax, which allows groups of order > 9

        So, from the example text, of seven words, below :

        A quick test about permutations of words
        

        We, now, obtain the text :

        test A words permutations about quick of
        

        Remark :

        This kind of permutations comes to light a cycle, giving again, at the end of the cycle, the initial configuration of words !

        For instance, let’s suppose the example text above in a new tab and the above S/R. Then, each time you click on the Replace All button ( or hit the ALT + A shortcut ), your get a new sentence, giving again, after ten clicks, the original sentence : A quick test about permutations of words :-))

        Here is, below the length of the cycle, for some replacement configurations :

        • REPLACE = \3 \1 \7 \5 \4 \2 \6 => cycle = 10

        • REPLACE = \7 \4 \3 \6 \2 \1 \5 => cycle = 6

        • REPLACE = \4 \6 \1 \7 \5 \3 \2 => cycle = 6

        • REPLACE = \3 \5 \2 \7 \4 \1 \6 => cycle = 7

        • REPLACE = \2 \1 \5 \4 \3 \7 \6 => cycle = 2

        • REPLACE = \2 \1 \4 \5 \6 \7 \3 => cycle = 8

        Note the two specific configurations, of cycle = 7, below :

        • \2 \3 \4 \5 \6 \7 \1

        • \7 \1 \2 \3 \4 \5 \6

        => The text seems to move the text forward or backwards, coming back by the other side ! Sorry, but as I’m French, I don’t know what are the right English words to describe this behaviour. Thanks, by advance, for giving me the right expression !

        Best Regards,

        guy038

        1 Reply Last reply Reply Quote 0
        • Vasile CarausV
          Vasile Caraus
          last edited by

          super answer. Thanks.

          But, if I want to consider connected little words separated by a hyphen, so that replacement take this words as a whole, and not separated?

          1 Reply Last reply Reply Quote 0
          • Vasile CarausV
            Vasile Caraus
            last edited by

            \w+(?:-\w+)+

            this will select the words separated by hyphen, but I don’t know how to include this into your search code above.

            1 Reply Last reply Reply Quote 0
            • guy038G
              guy038
              last edited by

              Hi, Vasile Caraus,

              Sorry for my late reply but I’m just back to work, after a three-weeks summer holidays :-((

              First of all, I would like to point out than you cannot use a regex with the + quantifier, four your specific case, although it, indeed, matches all the words of a sentence too !

              For instance, my previous regex (\w+)\h+(\w+)\h+(\w+)\h+(\w+)\h+(\w+)\h+(\w+)\h+(\w+) cannot be shorten in (\w+)(?:\h+(\w+))+. Why ? Well, just because there would be only two groups :

              • The first group : (\w+), which represents the first word of the 7-words sentence

              • The second group : (\w+), which represents the last word of the 7-words sentence ( last item of the list (?:\h+(\w+))+ )

              So, the solution consists to include all the separator characters, in addition to the previous syntax \h+


              Then, let’s suppose that your words may be separated either by a blank character, an hyphen or a colon. Given my previous example ( a 7-words sentence ), the original text could be :

              A-quick test-about	permutations:of-words
              

              with a space character between the word quick and test and a tabulation character between the words about and permutations

              Now, how do you like that the permuted words are linked together ?

              • With a space
              • With an hyphen
              • With a colon

              OR, do you prefer that the replacement keeps the six different separators, at their exact location ?


              • In the first case ( sentence with an unique separator between words ) use the following S/R :

              SEARCH : (\w+)[\h+:-](\w+)[\h+:-](\w+)[\h+:-](\w+)[\h+:-](\w+)[\h+:-](\w+)[\h+:-](\w+)

              and :

              REPLACE : \3 \1 \7 \5 \4 \2 \6

              or

              REPLACE : \3-\1-\7-\5-\4-\2-\6

              or

              REPLACE : \3:\1:\7:\5:\4:\2:\6


              In the second case, we need to remember the different separators, using additional groups, giving the following S/R :

              SEARCH : (\w+)([\h+:-])(\w+)([\h+:-])(\w+)([\h+:-])(\w+)([\h+:-])(\w+)([\h+:-])(\w+)([\h+:-])(\w+)

              REPLACE : $5$2$1$4$13$6$9$8$7$10$3$12$11

              Notes :

              • We are using, in replacement, the group form $n, as it accepts values of n > 9

              • You certainly noticed that the number of the groups, referring to words, are always an odd numbers and those, referring to separators, are always even numbers !

              • Therefore, if you want other permutations of this 7-words sequence, just change, as you like, the location of the forms $n, with an odd number, in the replacement regex, WITHOUT changing the forms $n with an **even number !

              So, from the above text, we would obtain, for the chosen permutation, with the last S/R, the consecutive sentences, in a 10-cycle, below :

              test-A words-permutations	about:quick-of
              
              words-test of-about	permutations:A-quick
              
              of-words quick-permutations	about:test-A
              
              quick-of A-about	permutations:words-test
              
              A-quick test-permutations	about:of-words
              
              test-A words-about	permutations:quick-of
              
              words-test of-permutations	about:A-quick
              
              of-words quick-about	permutations:test-A
              
              quick-of A-permutations	about:words-test
              
              A-quick test-about	permutations:of-words
              

              Cheers,

              guy038

              1 Reply Last reply Reply Quote 0
              • Vasile CarausV
                Vasile Caraus
                last edited by

                nice, but still is a little problem for my language. If I have little words like "le-a" the script will change the order in "a-le" but this is not a word anymore.

                1 Reply Last reply Reply Quote 0
                • Vasile CarausV
                  Vasile Caraus
                  last edited by Vasile Caraus

                  and another little thing. If I have a phrase like "Și pentru valurile lor mândre cine poate seta limite?" the regular expressions you made will change the words before the last two words "seta limite", these do not change.

                  1 Reply Last reply Reply Quote 0
                  • guy038G
                    guy038
                    last edited by guy038

                    Hi Vasile Caraus,

                    Oh ! Sorry, I was completely wrong about your problem :-(( I should have read your posts more carefully !

                    Indeed, you said, for instance :

                    if I want to consider connected little words separated by a hyphen

                    and also your regex \w+(?:-\w+)+

                    So, on the contrary, I see, now, that you just wanted to add the hythen-minus sign to the default word characters list, which contains the range [0-9_A-Za-z] + all the accentuated letters, in capital and lowercase !

                    OK, I just figure out which kind of regex we need !


                    Of course, as my previous regex concerns only 7 groups, it’s quite logical that my regex forgot the last two words of your 9-words sentence "Și pentru valurile lor mândre cine poate seta limite?" !!

                    So, there, still, are two problems :

                    • Firstly, do you like a specific permutation or must the needed permutation follow some specific rules ?

                    I mean : let’s consider four sentences, containing 3, 7, 9, 11 words. As each word needs to belong to a group, previously created, and starting from my previous example of a 7-words sentence, with the particular permutation \3 \1 \7 \5 \4 \2 \6 :

                      • For a 3-words sentence, do you prefer the permutation \2 \3 \1 or \3 \1 \2 or \2 \3 \1 and so on …

                      • For a 9-words sentence, do you prefer the permutation \5 \3 \6 \2 \7 \8 \1 \9 \4 or \7 \3 \9 \2 \5 \8 \6 \4 \1 or else…

                      • For a 11-words sentence, do you prefer the permutation $5 $3 $6 $10 $2 $7 $8 $1 $9 $11 $4 or $5 $4 $3 $2 $7 $8 $10 $6 $1 $9 $11 or … ?

                    Do you see the problem ?

                    • Secondly, have you got an idea of the maximum number of words of the sentences of your text ? We need that number to write all the groups, in the regex !

                    Therefore, in the meanwhile, I just give you, again, the updated regex S/R, for an EXACT 7-words sentence :

                    SEARCH : ([\w-]+)[^\w\r\n-]+([\w-]+)[^\w\r\n-]+([\w-]+)[^\w\r\n-]+([\w-]+)[^\w\r\n-]+([\w-]+)[^\w\r\n-]+([\w-]+)[^\w\r\n-]+([\w-]+)

                    REPLACE : \3 \1 \7 \5 \4 \2 \6

                    So, from the original text, below, with an hyphen inside the word permutations :

                    A quick test about permu-tations of words
                    

                    And the permutation 3 1 7 5 4 2 6, we obtain the right text, with the word permu-tations inchanged, below :

                    test A words permu-tations about quick of
                    

                    Notes :

                    • This long search regex is, simply, the concatenation of the regexes below :

                    ([\w-]+)
                    [^\w\r\n-]+([\w-]+)
                    [^\w\r\n-]+([\w-]+)
                    [^\w\r\n-]+([\w-]+)
                    [^\w\r\n-]+([\w-]+)
                    [^\w\r\n-]+([\w-]+)
                    [^\w\r\n-]+([\w-]+)

                    • The ([\w-]+) syntax represents a single word, made up default word characters and/or hyphens, surrounded by round brackets, to form a group

                    • The separator [^\w\r\n-]+ represents any range of characters, between the words, different from [\w-] and different of EOL line characters

                    • In the replacement part, I suppose that the separator is the usual space character

                    See you later,

                    guy038

                    1 Reply Last reply Reply Quote 0
                    • Vasile CarausV
                      Vasile Caraus
                      last edited by

                      works perfectly ! what can I say? you are grandiose ! THANK YOU !!

                      1 Reply Last reply Reply Quote 0
                      • First post
                        Last post
                      The Community of users of the Notepad++ text editor.
                      Powered by NodeBB | Contributors