Hello, @the-fartman and All,
I realize that I made some syntax errors in the last part of my last and very old post ( After the P.S. indication )
So, here is the updated version :
P.S. :
Here is, below, a mRNA sequence, translated into proteins, by a ribosome of a living cell ( found somewhere, in a regex documentation, to explain the interest of the \G regex feature ! ) :
....AUGGGUCGACUGGUUCUCGAAGGUUUCAAAGGUUCAAGGGUCCGGUAUUCAGUCGUCCGCUCUACUGGUACAAAGGGGGUACCACGACUGGUUCUCGAAUAG
If we take off the start codon AUG and its leading nearby sequences, as well as the stop codon UAG, we, now, obtain the sequence :
GGUCGACUGGUUCUCGAAGGUUUCAAAGGUUCAAGGGUCCGGUAUUCAGUCGUCCGCUCUACUGGUACAAAGGGGGUACCACGACUGGUUCUCGAA
¯¯¯ ¨¨¨ ¯¯¯ ¯¯¯ ¨¨¨ ¨¨¨ ¯¯¯ ¨¨¨ ¨¨¨
1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
In that RNA sequence, let’s consider the amino-acid sequence GGU. You remark, easily, that :
At positions 1, 19, 28 and 64, this sequence is a TRUE codon
At positions 9, 36, 41, 75 and 87, this sequence is NOT a codon
Now :
Place the cursor right before the RNA sequence GGUCGA… ( IMPORTANT)
To understand the effect of the \G feature, try, successively, the 3 regexes, below :
.*?GGU which matches the SHORTEST range of characters, till an amino-acid sequence GGU
(\u\u\u)*?GGU which matches the SHORTEST range of TRIPLETS, till an amino-acid sequence GGU
\G(\u\u\u)*?GGU which matches the SHORTEST range of CODONS, till a GGU codon
Notes :
The first regex matches any range of capital letters, whatever its size, followed with the first string GGU
The second regex matches any range, containing 3*n capital letters, followed with the first string GGU ( n >= 0 )
The third regex matches any range, containing 3*n capital letters, followed with the first string GGU, separated, from the beginning, by 3*m capital letters, so the GGU sequence is a codon ( n and m >= 0 )
Similarly, try these 3 regexes below :
.*GGU which matches the LONGEST range of characters, till an amino-acid sequence GGU
(\u\u\u)*GGU which matches the LONGEST range of TRIPLETS, till an amino-acid sequence GGU
\G(\u\u\u)*GGU which matches the LONGEST range of CODONS, till a GGU codon
Notes :
The first regex matches any range of capital letters, whatever its size, followed with the last string GGU
The second regex matches any range, containing 3*n capital letters, followed with a last string GGU ( n >= 0 )
The third regex matches any range, containing 3*n capital letters, followed with the last string GGU, separated, from the beginning, by 3*m capital letters, so the GGU sequence is a codon ( n and m >= 0 )
For further information, on that topic, refer to the links, below :
https://en.wikipedia.org/wiki/Codon
https://en.wikipedia.org/wiki/Start_codon
https://en.wikipedia.org/wiki/Stop_codon