RegEx: Anchor for beginning of file



  • Is there a RegEx symbol for the beginning of the file? Some places on the Web say it’s “\A” or “`”, but that doesn’t work. The Wiki page [1] doesn’t mention beginning or end of file, but says that " \A, ’ " matches the start of the matching string. Is there a way to make the whole file be the matched string?

    What I want to do is find all files that start with “<?php”. Is there a way to do that?

    Thank you.

    [1] http://docs.notepad-plus-plus.org/index.php/Regular_Expressions



  • I originally started using the info in the link you show, now I’ve moved onto a better website. This one explains in more detail, has better examples and I think has a more logical approach.
    http://www.rexegg.com/regex-quickstart.html
    Of course your website is only meant as a primer so what it does is good, but I quickly got beyond it’s abilities to show and guide.

    In the website link I provided look for the DOTALL concept. This might allow you to select (match) the entire content of the file.

    I started thinking that looking for the <?php should be simple but realised most regexs are designed to succeed, so they continue looking until the expression is true. What I think you need to research is a method to have it fail as soon as it moves 1 character into the file. Again the website I listed has examples on how the regex can be made to try and fail. So the concept would be (in plain english) look for the string <?php where there are absolutely no characters before, not even spaces or CR/LF.

    I will continue trying to come up with a regex, but thought you’d maybe like to know what I have considered thus far.

    Terry



  • I’ve been playing with this and my solution is
    Find: (?<!\x0A)^\x3C\x3Fphp

    The problem I had was that the < and ? are special characters and although I tried using the delimiter on them it didn’t work. Also the \R caused the same problem. In the end I used the hex code for these characters.

    So (?<!\x0A) means I do not want a line feed. This may need amending in your environment depending on the character code set used. I assumed it is a CR followed by a LF. x0A is the LF only.
    Then we want a ^, start of line
    Then we want the characters <?php

    You mentioned wanting to select the entire file contents if the characters you seek are at the immediate start, so to add to my regex we would then have
    Find: (?<!\x0A)^\x3C\x3Fphp(.*\R*)*

    This is only half the battle. Now I think if you looking across multiple files using some automatic process this regex needs incorporating somehow into that. Unfortunately that is outside my abilities, presently.

    Terry



  • Hello, @ennexmb, @terry-r and All,

    Indeed, in some cases, the \A assertion, standing for the beginning of file, when followed with a specific regex, may not match the location right before the very first character of the current file.

    However, in your case, @ennexmb, as you’re searching for a string <?php, made up of literal characters only, it should be OK !

    So if you scan of bunch of files, using the Find or the Find in Files dialog, , the regex \A<\?php should identify any [ PHP ] file, whose first line begins with the <?php string ;-))

    Just note that the question mark must be escaped , as \?, to be interpreted as a literal char !

    Best Regards,

    guy038



  • Oh geez, my stupidity!

    I just ran the search again using \A<\?php as suggested by guy038 and it does indeed work. I’m guessing that when I tired it before I neglected to escape the ?. Duh!

    Terry, thanks for your valiant efforts. I tried your search string as well and it also works. That’s some pretty hardcore hacking, which would have been a great solution had the \A actually not been working.

    And I’ve also learned from this not to trust the Notepad++ Wiki as documentation. I wanted to use that instead of a reference like the one suggested by Terry because it’s specific to Notepad++. Since there are different flavors of RegEx, I wanted a doc specific to the flavor in Notepad++. But it doesn’t help much if it’s wrong!

    Thanks Terry and guy for your help.



  • I’ve played around a bit to find the absolute start of file in the current version of Notepad++ (7.8.6), I’ve found that ^\A works beautifully.

    Also, I don’t know if it’s documented anywhere, but another item I’ve found in Notepad++ is that \Z functions as ‘end of a blank line’ rather than the absolute end of file, whereas \z (lower case Z) acts to only detect the absolute end of the file.



  • @Dennis-Hughes

    (7.8.6), I’ve found that ^\A works beautifully.

    Have you seen the recent discussions with @guy038 and @Alan-Kilborn? There might be more tests to run described in there. (I cannot remember which topic, and forum searches don’t work to find punctuation. One of them should be able to link you to the more-detailed discussion of the \A issues)

    Also, I don’t know if it’s documented anywhere

    https://npp-user-manual.org/docs/searching/#anchors

    is that \Z functions as ‘end of a blank line’ rather than the absolute end of file, whereas \z (lower case Z) acts to only detect the absolute end of the file.

    Not quite.

    You are right that \z only matches absolute end-of-file, or the end of the string if ☑ In selection is enabled.

    However, \Z does not match ‘end of a blank line’. As the documentation says, it matches \z preceded by 0 or more blank lines. You can see this by pasting the below file into Notepad++ and searching for \Z: it will match in two locations: at the end of line H (which is \z preceded by a single newline), and the beginning of the empty line after (called I), which is \z without any newlines.

    If your description were right, it would have matched on B, D, E, G, and I; but it does not.

    A. Not blank
    
    C. Two (B) was blank
    
    
    F. Four and five (D,E) were blank
    
    H. Next line (I) will be blank and end of file
    
    





  • Hi, @dennis-hughes @peterjones, @alan-kilborn, @terry-r and All,

    Peter , when I first pasted your sample text, I stupidly forgot the EOL chars of line H, and obviously, I just got one match, using the \Z syntax !

    So, to any people trying to reproduce @peterjones manipulations, beware that the line H must end with a line-break, in order to create an empty line I, without any EOL char !

    @dennis-hughes :

    To be short, any of these three syntaxes (?<!\n|\r|\f)^    ,    ^\A    ,    \A^

    • is a work-around syntax to replace the buggy \A legal form !

    • Correctly works, assuming that all your file(s) scanned does/do not begin with true empty line(s)

    For the particular cases, simply refer to @alan-kilborn’s links, in the previous post

    BR

    guy038



  • Hi, All,

    Do you know how I could maintain some spacing between the three regexes :

    (?<!\n|\r|\f)^    ,    ^\A    ,    \A^

    Which are usually rewritten :

    (?<!\n|\r|\f)^ , ^\A , \A^

    in the legal code text, of my previous post ?


    Well, the trick is to use No Break Space character(s) ( \xA0 ), instead of the usual Space chars ( \x20 ) ;-))

    So, use the Alt + 160 input method, from the numeric keypad, to insert a No Break Space character, at current cursor location !

    Cheers,

    guy038


Log in to reply