Find a occurence between some different lines



  • Hi everyone,
    my code is something like that:
    t1
    bunch of lines with x y z followed by numbers (from 3 to 3000)
    t3
    lines (from 3 to 3000)
    t6
    lines
    t94
    lines
    word “TURN” in the middle of the lines
    lines
    t3
    lines
    TURN
    t90
    lines
    t2
    lines

    My problem is to find with the regex the letter t, followed by his number, but only before the section with the word TURN: in this case t94 and t3 the second time.
    The number after the letter t is random so the times that the word turn can be found.

    Thanks is advance.



  • Hello, Mario,

    Waooooh! To find a correct regex, for your problem, was not very easy !!.. and will be, certainly, difficult to explain :-((

    As you said, I suppose :

    • that all your sections begin with the lowercase letter t, followed with a number, before an EOL character

    • that the word TURN is always written in capital letters

    And you need to find the nearest previous section header, before an exact string TURN. That is to say, in your example, the two sections t94 and t3, only.

    The expression nearest previous section means that, between any found section and the further string TURN, you cannot find an other section header !!

    So, I started with the simple regex (?s-i)t\d+(?=\R.*?TURN)

    In that regex :

    • Firstly, the (?s-i) syntax, relative to the two in-line modifiers, i and s, means that the search will be performed in a non-insensitive way and that the dot regex character stands for, absolutely, any character, standard or EOL ( hence the s symbol for single line ! ). The interest of these modifiers is that they have priority, over the Match case and . matches newline options of the Find/Replace dialog

    • Then, the part t\d+ is the regex to find, that is to say the section header

    • The final part, (?=\R.*?TURN), is a positive look-ahead feature, which is NOT part of the final regex. It’s just a condition which must be verified for the regex engine considers that there’s a positive match. Indeed, the section header, to find, is followed with EOL character(s), and followed with anything ( even on several lines ), till the nearest string TURN

    • Note that the \R form is a shortened syntax for the regex \r\n|[\n\v\f\r\x{2028}\x{2029}]. But, practically, we may consider that it represents any kind of EOL characters ( \r\n for Windows files, \n for Unix/OSX files ) or \r for Old Mac files )

    However…, if we place the cursor, before the t1 section header, of your example, the regex engine matches any section header t1, t3, t6… till the second t3 header. Indeed, the last two headers t90 and t2 are NOT followed with the TURN string ! So, good start, but not enough :-((

    Indeed, we must add a condition, which tells the regex engine that, between the section header to find and the exact string TURN, NO OTHER section header is to be found. Therefore, I used the negative look-ahead (?!\Rt\d+\R) with means : NOT a section header, surrounded by EOL characters.

    But, when this new negative condition must be verified ? Well, that condition meeds to be verified at any position, after the matched section header, till the nearest string TURN, found, further on, through the file contents !

    As these successive locations are represented by the dot symbol, in my previous regex (?s-i)t\d+(?=\R.*?TURN), we just insert the new condition (?!\Rt\d+\R), to be verified, before the dot symbol ( enclosing the condition AND the dot in a group), giving the final and correct regex, below :

    (?s-i)t\d+(?=\R((?!\Rt\d+\R).)*?TURN)

    Important thing to note : the part ((?!\Rt\d+\R).) means that the regex engine, verifies that any particular dot location does NOT begin a section header ( EOL char , letter t , number and EOL char )

    Best regards,

    guy038

    P. S. :

    Of course, I’d be pleased if someone else could find a more simple regex to achieve Mario’s problem :-)) We, all, have to learn about correct regexes !



  • I’m very sorry i didn’t check your answer before; thanks for taking your time to write this code I’m learning a lot from it.
    I’ll try to adapt this to my actual code.
    Meanwhile I’ve got another idea, but I need to execute it in detail: use this code(with the cursor at the beginning of the code)" .*?TURN" to highlight everything till the word TURN, than cut it before pasting to anther page so i have only one letter t before TURN. At this point I can use the simple code (with the cursor at the beginning of the new page) .t.?TURN to do what i want, than paste back to my first code.
    Keep in mind that i use this code inside a macro to make it very fast and user-friendly.

    Thanks again for your time.



  • Mario,

    Sorry, but I’m a bit confused, about the way you would like to “slice” your text:-(( From your example text, given in your first post, could you tell us which kind of text would you like to obtain ?

    Of course, you may prefer to start with an other orginal text and with the resulting text that you would expect to !

    Cheers,

    guy038



  • Simply after selecting with the first code .*?TURN, cut the selected code to a new tab so I have only one t before the word TURN.
    Anyway thanks for your solution, I really need to learn more the lookahead command in regex.



  • Hello Mario,

    To get, from cursor location the next bloc of lines, beginning with a header t###, till the nearest word TURN, in that exact case, use the regex, below :

    (?s-i)t\d+\R((?!\Rt\d+\R).)*?TURN

    Afterwards, it’s possible :

    • Perform a first search

    • Copy that selection in the clipboard ( CTRL + C )

    • Switch to a new tab, previously created

    • Copy the clipboard’s contents ( CTRL + V )

    • Type on the F3 key to begin again all the process !

    Cheers,

    guy038


Log in to reply