RegEx: Anchor for beginning of file



  • Is there a RegEx symbol for the beginning of the file? Some places on the Web say it’s “\A” or “`”, but that doesn’t work. The Wiki page [1] doesn’t mention beginning or end of file, but says that " \A, ’ " matches the start of the matching string. Is there a way to make the whole file be the matched string?

    What I want to do is find all files that start with “<?php”. Is there a way to do that?

    Thank you.

    [1] http://docs.notepad-plus-plus.org/index.php/Regular_Expressions



  • I originally started using the info in the link you show, now I’ve moved onto a better website. This one explains in more detail, has better examples and I think has a more logical approach.
    http://www.rexegg.com/regex-quickstart.html
    Of course your website is only meant as a primer so what it does is good, but I quickly got beyond it’s abilities to show and guide.

    In the website link I provided look for the DOTALL concept. This might allow you to select (match) the entire content of the file.

    I started thinking that looking for the <?php should be simple but realised most regexs are designed to succeed, so they continue looking until the expression is true. What I think you need to research is a method to have it fail as soon as it moves 1 character into the file. Again the website I listed has examples on how the regex can be made to try and fail. So the concept would be (in plain english) look for the string <?php where there are absolutely no characters before, not even spaces or CR/LF.

    I will continue trying to come up with a regex, but thought you’d maybe like to know what I have considered thus far.

    Terry



  • I’ve been playing with this and my solution is
    Find: (?<!\x0A)^\x3C\x3Fphp

    The problem I had was that the < and ? are special characters and although I tried using the delimiter on them it didn’t work. Also the \R caused the same problem. In the end I used the hex code for these characters.

    So (?<!\x0A) means I do not want a line feed. This may need amending in your environment depending on the character code set used. I assumed it is a CR followed by a LF. x0A is the LF only.
    Then we want a ^, start of line
    Then we want the characters <?php

    You mentioned wanting to select the entire file contents if the characters you seek are at the immediate start, so to add to my regex we would then have
    Find: (?<!\x0A)^\x3C\x3Fphp(.*\R*)*

    This is only half the battle. Now I think if you looking across multiple files using some automatic process this regex needs incorporating somehow into that. Unfortunately that is outside my abilities, presently.

    Terry



  • Hello, @ennexmb, @terry-r and All,

    Indeed, in some cases, the \A assertion, standing for the beginning of file, when followed with a specific regex, may not match the location right before the very first character of the current file.

    However, in your case, @ennexmb, as you’re searching for a string <?php, made up of literal characters only, it should be OK !

    So if you scan of bunch of files, using the Find or the Find in Files dialog, , the regex \A<\?php should identify any [ PHP ] file, whose first line begins with the <?php string ;-))

    Just note that the question mark must be escaped , as \?, to be interpreted as a literal char !

    Best Regards,

    guy038



  • Oh geez, my stupidity!

    I just ran the search again using \A<\?php as suggested by guy038 and it does indeed work. I’m guessing that when I tired it before I neglected to escape the ?. Duh!

    Terry, thanks for your valiant efforts. I tried your search string as well and it also works. That’s some pretty hardcore hacking, which would have been a great solution had the \A actually not been working.

    And I’ve also learned from this not to trust the Notepad++ Wiki as documentation. I wanted to use that instead of a reference like the one suggested by Terry because it’s specific to Notepad++. Since there are different flavors of RegEx, I wanted a doc specific to the flavor in Notepad++. But it doesn’t help much if it’s wrong!

    Thanks Terry and guy for your help.


Log in to reply