REGEX again: How can I select/mark this 3 works on a different lines



  • hi, again. I have a problem. I don’t know how to marks\select those words on different lines: Recent, Comments and Tags

    please see this print screen:

    https://snag.gy/ZXNxJR.jpg

    I can easy to put| sign, like: Recent|Comments|Tags but this will select all the words in the files that repeats, and I want only those 3 on those lines.



  • Hi, vasile,

    Before building any regex, I need some additional information ! Note that I used the original spelling, from your picture !

    Do the different cases, below, may happen ? I mean : do you want that the regex engine marks, also, the three lines, when these cases, below, happen ?

    					Recente
    			Coments                            Case A ( with DIFFERENT indentation, for the 3 words )
    							Tags
    
    
    
    Recente
    Coments                                        Case B ( With NO indentation )
    Tags
    
    
    				Coments
    				Tags                           Case C ( The three words in a DIFFERENT order )
    				Recente
    

    Cheers,

    guy038



  • hello guy38. I need to select as they show in my print screen. Because that case repeats in all of my 200 files. So, I believe I need case B and case C. But if you please, for other that want, you can help with the case A also.



  • Vasile,

    Oh, my God ! It was difficult enough to get the right regexes, Indeed !!

    So, in addition, to your case shown in your picture, the three cases A, B and C must be detected by the regex engine !

    After numerous tests, I, finally, thought it was better to split the problem in two parts :

    • Firstly, correctly identify the consecutive lines, which have to be marked, later on

    • Secondly, mark all these lines

    To do so, I needed an additional character, which does NOT exist, yet, in your files. I, personally chose the sharp character #, but any other character may be used !


    So, when ONE of these six blocks of text, with possible indentation, below, occurs, a first regex, will add a # character, at the beginning of the first line of that block

    Recente
    Coments
    Tags
    
    Recente
    Tags
    Coments
    
    Coments
    Recente
    Tags
    
    Coments
    Tags
    Recente
    
    Tags
    Recente
    Coments
    
    
    Tags
    Coments
    Recente
    

    The S/R needed is :

    SEARCH ^\h*Recente\R(?=\h*Coments\R\h*Tags|\h*Tags\R\h*Coments)|^\h*Coments\R(?=\h*Recente\R\h*Tags|\h*Tags\R\h*Recente)|^\h*Tags\R(?=\h*Recente\R\h*Coments|\h*Coments\R\h*Recente)

    REPLACE #$0

    We get the new text, as below :

    #Recente
    Coments
    Tags
    
    #Recente
    Tags
    Coments
    
    #Coments
    Recente
    Tags
    
    #Coments
    Tags
    Recente
    
    #Tags
    Recente
    Coments
    
    
    #Tags
    Coments
    Recente
    

    Notes :

    • This regex, although complicated, is, simply, an alternative between one of the three regexes, below :

      • ^\h*Recente\R(?=\h*Coments\R\h*Tags|\h*Tags\R\h*Coments)

      • ^\h*Coments\R(?=\h*Recente\R\h*Tags|\h*Tags\R\h*Recente)

      • ^\h*Tags\R(?=\h*Recente\R\h*Coments|\h*Coments\R\h*Recente)

    • The first regex looks for the 2 blocks, beginning with the word Recente, the second for the 2 blocks, beginning with Coments and the last one for the 2 blocks, beginning with Tags

    • If some lines, in the files, contain a single word, from the keywords list ( recente, Coments and Tags ), they are NOT marked with a # character, as the general template is not respected !

    • Remember that the escaped sequence \h represents a single horizontal blank character. So, either :

      • A Space character \x20

      • A Tabulation character \t

      • A No-Break Space character \xa0


    Now , we, simply, need to mark all the lines of the blocks, which begin with a # character. To do so, we’ll use, SUCCESSIVELY, the three regexes below :

    • The first regex (?-s)^#.+ will mark the first line of each block
    • The second regex (?-s)^#.+\R\K.+ will mark the second line of each block
    • The third regex (?-s)^#(.+\R){2}\K.+ will mark the third line of each block

    Notes :

    • The use of the \K syntax, that resets the search position of the regex engine, allow us to mark the line that matches the part of the regex, located after the \K form, that is to say .+, which stands for all the standard characters of the current line

    Finally, we have to get rid of the dummy # character, with the obvious S/R :

    SEARCH #

    REPLACE Empty

    Now, it’s up to you to decide how to treat all these marked lines ;-))

    Cheers,

    guy038



  • @guy038 said:

    ^\hRecente\R(?=\hComents\R\hTags|\hTags\R\hComents)
    ^\h
    Coments\R(?=\hRecente\R\hTags|\hTags\R\hRecente)
    ^\hTags\R(?=\hRecente\R\hComents|\hComents\R\h*Recente)

    hello guy38. I still confront with the old problem. At my notepad, non of your regex here doesn’t work. I believe, the combination \R\h or \k doesn’t like to my notepad++ (the last version). I try a different editor (Sublime editor) and doesn’t work any of the regex you made here.



  • Hi, Vasile,

    Let’s begin with simple things !

    • Copy the text, below, in a new tab :

      Recente
      Coments
      Tags

      Recente
      Tags
      Coments

      Coments
      Recente
      Tags

      Coments
      Tags
      Recente

      Tags
      Recente
      Coments

      Tags
      Coments
      Recente

    • Copy the complete regex, below, into the Find what field of the Find dialog

    ^\h*Recente\R(?=\h*Coments\R\h*Tags|\h*Tags\R\h*Coments)|^\h*Coments\R(?=\h*Recente\R\h*Tags|\h*Tags\R\h*Recente)|^\h*Tags\R(?=\h*Recente\R\h*Coments|\h*Coments\R\h*Recente)

    • Then , any click on the Find Next button should select the first line of each “three-lines” block

    Do you get such behaviour ?

    Just for info, my current version is N++ v7.2

    Cheers,

    guy038



  • I copy the last text, and this REGEX works, and it selects the first line. So this is good.

    But, this will select only the first line. I need to select, practically, those 3 words, on a different line (3 line close one by another). And every word is not at the begginig of the line, but somewhere in the middle of the line, because of the tab.

    In a big way, I want to make a regex, starting at those 3 words, and to remove everything after them and another regex to delete everything before them. The problem is that those 3 words are on different consecutive lines (line 13,14 and 15). Practically a group. I don’t want to select other the same words on the files, only those group, because I will use FIND/REPLACE ALL, so I don’t want to replace anything else.

    I have the regex to remove everything after/before a word or words on a single line. But now I have to select a group of words on a different consecutive line, and to delete everything after/before them



  • and, please excuse me for the title. It had to be “How can I select/mark 3 words on 3 consecutive lines”. And each of the words are somewhere in the middle of lines, by tab or space.



  • Vasile,

    I did understand that you wanted to mark the three consecutive lines of each block of lines, and not only the first one !

    But, just follow my first post , on that topic :

    To verify, copy, as before, my previous text in a new tab

    • FIRSTLY, perform the S/R, below :

    SEARCH ^\h*Recente\R(?=\h*Coments\R\h*Tags|\h*Tags\R\h*Coments)|^\h*Coments\R(?=\h*Recente\R\h*Tags|\h*Tags\R\h*Recente)|^\h*Tags\R(?=\h*Recente\R\h*Coments|\h*Coments\R\h*Recente)

    REPLACE #$0

    => A sharp character, #, should be inserted at the beginning of the FIRST line of each block


    • SECONDLY, perform the three regexes , below, ONE AFTER ANOTHER, in the Mark dialog, by clicking on the Mark All button

    SEARCH (?-s)^#.+

    SEARCH (?-s)^#.+\R\K.+

    SEARCH (?-s)^#(.+\R){2}\K.+

    => At the end, each line, of each block, should be marked


    But, from what you said in your last post, I suppose, for instance, that, giving the example text, below :

    		Text to be deleted  Recente    Text to be deleted
    				bla bla			Coments			123456789
    	ABCDEF	Tags bla bla bla that's the End
    

    you would expect the resulting text :

    Recente
    Coments
    Tags
    

    Is my assumption correct ? If so, give me half an hour more, and I’ll post you the final regexes

    Best Regards

    guy038



  • Vasile,

    Although, you have not reply to my question, yet, I suppose that I was right and here are the new regexes, to achieve such a result !

    As said before, the first regex is UNCHANGED. Remember that it just marks any of the six blocks, wanted, with a special character, # ( which must not be used, yet, in your files ! ) So this first S/R is :

    SEARCH ^\h*Recente\R(?=\h*Coments\R\h*Tags|\h*Tags\R\h*Coments)|^\h*Coments\R(?=\h*Recente\R\h*Tags|\h*Tags\R\h*Recente)|^\h*Tags\R(?=\h*Recente\R\h*Coments|\h*Coments\R\h*Recente)

    REPLACE #$0

    => After this global replacement, a sharp character, #, should have been inserted, at the beginning of the FIRST line of each block !


    Now, perform, SUCCESSIVELY, the three S/R, below, by clicking on the Replace All button

    SEARCH (?-s)^#(?:.+\R){2}\K.*(Recente|Coments|Tags).*

    REPLACE \1

    => The third line, of each block, should have been modified


    SEARCH (?-s)^#.+\R\K.*(Recente|Coments|Tags).*

    REPLACE \1

    => The second line, of each block, should have been modified


    SEARCH (?-s)^#.*(Recente|Coments|Tags).*

    REPLACE \1

    => Finally, the first line, of each block, should have been modified

    Et voilà !

    IMPORTANT :

    • These three S/R must be run, in that EXACT order, to be sure that the temporary mark character, #, will be deleted during the last S/R, only !!

    • Don’t use the Replace button for these three S/R, ONLY the Replace All one !!

    Cheers,

    guy038



  • hello guy38, I wasn’t here, sorry.
    So, I test all your new regex. First, the long regex, replace by#$0 works very good.

    The little problem are the next 3 SUCCESSIVELY regex. One by one. After Search and Replace with \1, I get the successful message “6 occurrences has were replaced.”

    The problem is that, in fact, nothing has changed…is like nothing happen…

    The last of the 3 successively regex (?-s)^#.*(Recente|Coments|Tags).*, after replace with \1, removes the # sign witch I had put with the first long regex



  • Oh, WORKS ! I use another example. So, excuse-me. All your regex works just fine guy38. You practically move all 3 words to the beginning of lines.

    Thanks a lot



  • and, I return to my old problem. Now with your help, I manage to take all 3 words to the start of the lines. The question was, how can I mark/delete something before or after those 3 words. For example:

    text bla Commens
    text bla bla Tags

    Recente
    Coments
    Tags

    text bla Recente
    text bla bla Tags

    So, you see that the words “Recente, Coments and Tags” are repeating. So, I manage to resolve my first problem: HOW TO DELETE EVERYTHING BEFORE THOSE 3 words on the 3 successively lines, included those 3 lines:

    Search
    ((?s)((^.*)^Recente|^Coments|^Tags))(.*$)
    Replace by:
    Leave empty



  • Now, to delete everything after those 3 words (and 3 lines) included those 3 lines

    Search:
    ^.*(?s)^Recente|^Coments|^Tags.*[\s\S]
    Replace by:
    leave empty



  • so, you see my friend, where I wanted to get. The single problem was that all that 3 words, were somewhere in the middle of the lines, and I didn’t know how to select those SUCCESSIVELY lines.

    thanks a lot.

    But If you know another 2 regex to do directly this, for deleting before and after, without adding the 3 words at the beggining at the lines, please let me know.



  • Vasile,

    I’m just back on our forum and, after reading your post, where you said :

    The problem is that, in fact, nothing has changed…is like nothing happen…

    I quickly guessed why it did NOT work :-) Ah ! Again, I forgot to add this VERY IMPORTANT fact :

    If your pattern regex contains one or more look-behinds AND/OR any \K form, you must, EXCLUSIVELY use the Replace All button to perform a global replacement.

    DON’T use the Replace button, for a step by step replacement : it does NOTHING !!

    So, for performing, correctly, my three previous regexes :

    (?-s)^#(?:.+\R){2}\K.*(Recente|Coments|Tags).*

    (?-s)^#.+\R\K.*(Recente|Coments|Tags).*

    (?-s)^#.*(Recente|Coments|Tags).*

    with replacement \1

    • Move the caret, before the text to change

    • Perform the three S/R, in that order, by clicking, EXCLUSIVELY, on the Replace All button ( Not the Replace button !! )

    It should be OK

    Of course, don’t forget to delete the extra # character, at the end

    Cheers,

    guy038

    So, I updated my previous post !



  • WORKS !!!



  • A different nice regex looks like this. First check . match newline

    Search:
    .*\n([ \t]+Recente\s+Coments\s+Tags).*
    REPLACE By:
    $1 or \1

    Don’t forget to check . match newline

    This will select all that 3 words, even if are delimitaded by tab or spaces. And will delete the rest of document.
    But how can I use this regex, to remove everything only before those 3 words, and another regex to remove only after those 3 words?



  • or, without check . match newline

    Search:
    (?s).*\n([ \t]+Recente\s+Coments\s+Tags).*(.*$)
    REPLACE By:
    $1 or \1



  • hello, I find the solutions:

    So, to delete everything before those 3 words, included the 3 lines with the words:

    Search:
    (?s).*\n([ \t]+Recente\s+Coments\s+Tags)
    Replace By:
    Leave Empty

    Delete everything after those 3 words included the 3 lines with the words:
    Search:
    (?s)([ \t]+Recente\s+Coments\s+Tags)(.*$)
    Replace by:
    $1


Log in to reply