RegEx: Anchor for beginning of file
-
Is there a RegEx symbol for the beginning of the file? Some places on the Web say it’s “\A” or “`”, but that doesn’t work. The Wiki page [1] doesn’t mention beginning or end of file, but says that " \A, ’ " matches the start of the matching string. Is there a way to make the whole file be the matched string?
What I want to do is find all files that start with “<?php”. Is there a way to do that?
Thank you.
[1] http://docs.notepad-plus-plus.org/index.php/Regular_Expressions
-
I originally started using the info in the link you show, now I’ve moved onto a better website. This one explains in more detail, has better examples and I think has a more logical approach.
http://www.rexegg.com/regex-quickstart.html
Of course your website is only meant as a primer so what it does is good, but I quickly got beyond it’s abilities to show and guide.In the website link I provided look for the
DOTALL
concept. This might allow you to select (match) the entire content of the file.I started thinking that looking for the
<?php
should be simple but realised most regexs are designed to succeed, so they continue looking until the expression is true. What I think you need to research is a method to have it fail as soon as it moves 1 character into the file. Again the website I listed has examples on how the regex can be made to try and fail. So the concept would be (in plain english) look for the string<?php
where there are absolutely no characters before, not even spaces or CR/LF.I will continue trying to come up with a regex, but thought you’d maybe like to know what I have considered thus far.
Terry
-
I’ve been playing with this and my solution is
Find:(?<!\x0A)^\x3C\x3Fphp
The problem I had was that the
<
and?
are special characters and although I tried using the delimiter on them it didn’t work. Also the\R
caused the same problem. In the end I used the hex code for these characters.So
(?<!\x0A)
means I do not want a line feed. This may need amending in your environment depending on the character code set used. I assumed it is a CR followed by a LF.x0A
is the LF only.
Then we want a^
, start of line
Then we want the characters<?php
You mentioned wanting to select the entire file contents if the characters you seek are at the immediate start, so to add to my regex we would then have
Find:(?<!\x0A)^\x3C\x3Fphp(.*\R*)*
This is only half the battle. Now I think if you looking across multiple files using some automatic process this regex needs incorporating somehow into that. Unfortunately that is outside my abilities, presently.
Terry
-
Hello, @ennexmb, @terry-r and All,
Indeed, in some cases, the
\A
assertion, standing for the beginning of file, when followed with a specific regex, may not match the location right before the very first character of the current file.However, in your case, @ennexmb, as you’re searching for a string
<?php
, made up of literal characters only, it should be OK !So if you scan of bunch of files, using the
Find
or theFind in Files
dialog, , the regex\A<\?php
should identify any [ PHP ] file, whose first line begins with the<?php
string ;-))Just note that the question mark must be escaped , as
\?
, to be interpreted as a literal char !Best Regards,
guy038
-
Oh geez, my stupidity!
I just ran the search again using
\A<\?php
as suggested by guy038 and it does indeed work. I’m guessing that when I tired it before I neglected to escape the?
. Duh!Terry, thanks for your valiant efforts. I tried your search string as well and it also works. That’s some pretty hardcore hacking, which would have been a great solution had the
\A
actually not been working.And I’ve also learned from this not to trust the Notepad++ Wiki as documentation. I wanted to use that instead of a reference like the one suggested by Terry because it’s specific to Notepad++. Since there are different flavors of RegEx, I wanted a doc specific to the flavor in Notepad++. But it doesn’t help much if it’s wrong!
Thanks Terry and guy for your help.