Regex: Select everything (the whole line) to the dot but not more dots



  • I have this 2 cases:

    1. I love you.
    
    2. I love you...
    

    I want to select with regex only the first line to the dot, but not the second that has more dots.

    I made this regex, but is not to good: ^(.+?\.)\K{0,3}$ or ^(.+?\.)(?!{1,1})$ or ^.*?!\.{1,3}$

    Maybe someone can help me



  • Hello, @vasile-caraus and All,

    Simply, use this regex :

    SEARCH [^.\r\n]+\.(?!\.)

    Notes :

    • First, the part [^.\r\n]+ looks for a non-empty range of standard characters, different from a dot

    • Then, the part \. tries to match a literal dot char

    • But ONLY IF it is not followed with a second dot char, due to the negative look-ahead (?!\.)

    Best Regards,

    guy038



  • thanks @guy038

    And another case, suppose I have this 2 lines, also with numbers 1. and 2. at beginning :

    1. I love you.

    2. I love you...

    in this case your regex [^.\r\n]+\.(?!\.) will selects also the second line, until the first dot 2. But If I don’t want to select at all this second line that contains also one dot after 2 ?



  • To future readers:

    It was nice to see that the original poster was willing to give what he tried. That is to be lauded.

    I am replying only so that other people will learn. I will explain what that individual did wrong in each case to help you future readers learn (and with the very weak hope that the original questioner will also try to learn from this example), and will explain my thought process when coming up with a working solution.

    ^(.+?\.)\K{0,3}$ will not work because {0,3} has nothing to modify – the find dialog will tell you that it was an invalid regex. See the red text at the bottom:

    56a30abe-5c03-4005-982b-e23c604a190e-image.png

    I believe what was intended was ^(.+?\.)\K\.{0,3}$, which says find one or more of any character (non-greedy), followed by a literal dot; then reset the match, then find 0 to three literal dots. But that still finds the last two dots of the .... This is because the regex is saying you want 0 to 3 dots after the first-found-dot, which is not what the original poster described in text.

    ^(.+?\.)(?!{1,1})$ is once again invalid regular expression, again because the {1,1} says from 1 to 1 of the preceding token, and the preceding token isn’t specified, so it’s invalid. Even with a character there to quantify: why would you want to specify a quantity from 1 to 1? Just use the character. Assuming again that a literal dot was supposed be there with quantity 1, that should have been written ^(.+?\.)(?!\.{1,1})$, or more simply ^(.+?\.)(?!\.)$ because you don’t need a quantifier if it’s always exactly one. But that still won’t work because the regex is “one or more of any character, followed by a literal dot, not followed by another dot, then end-of-line”. But the third dot obviously matches that.

    ^.*?!\.{1,3}$ was the only valid regex (ie, didn’t complain about invalid regex. It looks for 0 or more characters followed by 1 to 3 dots. But you said you didn’t want to match if there was more than one dot, so that regex contradicts your description of your problem.

    I am going to assume that you want to select from the beginning of the line through the end of the line for any line that ends in exactly one dot. In text, I might say, “match from beginning of line, then zero or more characters, followed by a dot that isn’t preceded by another dot, followed by the end of a line.”

    • “match from beginning of line” = ^
    • “then zero or more characters” = .*
    • “followed by a dot that isn’t preceded by another dot” = (?<!\.)\. – I used the negative lookbehind to guarantee the character before isn’t a dot
    • “followed by the end of a line” = $

    Put it all together, and you have ^.*(?<!\.)\.$. This finds only one match in your example data:

    5faf2165-5b00-49c0-9156-f5489232395a-image.png

    This selects the whole line, which I interpreted your description to mean.

    Guy’s solution is a bit different, and took a different interpretation of the description: his finds the 1. as the first match, the I love you. (including the space before the I) as a second match, and the 2. as a third match. So his had 3 matches compared to my 1. This obviously means the original problem statement was unclear, because two reasonable people came up with two very different interpretations of the requirements.

    If you want to do regex search and replace, you have to think about the problem in little tiny steps, and be able to describe those steps to yourself, or to others, in little tiny detail. If you are unwilling to do this, you will never be good at regex, and we will get tired of helping you. Please learn from these thought processes.



  • Hi, @vasile-caraus, @peterjones and All,

    I did notice that, with my previous search regex, it also matched the 2., beginning the second line ! but I thought that you wanted to match any sentence, ending with a period !

    But, of course, the @peterjones’s regex is the right one ! Just a slight modification : I would use ^.+(?<!\.)\.$ as probably, you don’t want to match a line with one dot only !

    But what about cases like below ?

    3. I love you. I love you...
    
    4. I love you... I love you.
    

    So, just test the my regex [^.\r\n]+\.(?!\.), the Peter’s one ^.*(?<!\.)\.$ and also this new one (?!\d)[^.\r\n]+\.(?!\.), against the text, below, and see the differences !

    1. I love you.
    
    2. I love you...
    
    3. I love you. I love you...
    
    4. I love you... I love you.
    

    BR

    guy038

    @vasile-caraus,

    Like you, I may sometimes create a regex which returns the fatal message Find: Invalid regular expression ! But unlike you, I learned, little by little, the basic features of regexes, and, then, some advanced functionalities, all with the help of regex tutorials ! So, I’m quickly able to get the part with wrong syntax or missing characters and can rebuild my regex correctly !

    Indeed ! For example, using a quantifier range as {0,3}, without some material to quantify, located before, should rather be considered as a noob error !

    After a fair practice of simple regular expressions, you will not reproduce this kind of wrong syntax ! Regex knowledge is as everything and as a Lego game : You first join two simple pieces together, then build a wall, made of some pieces, then a house, made of walls and so on !!



  • @guy038 said in Regex: Select everything (the whole line) to the dot but not more dots:

    should rather be considered as a noob error

    After a fair practice of simple regular expressions, you will not reproduce this kind of wrong syntax !

    Some number of individuals are stuck in this “noob” state, forever.
    Sad but true.



  • thank you all !


Log in to reply