Community
    • Login

    RegEx: Anchor for beginning of file

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    regex
    10 Posts 6 Posters 8.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • EnnexMBE
      EnnexMB
      last edited by

      Is there a RegEx symbol for the beginning of the file? Some places on the Web say it’s “\A” or “`”, but that doesn’t work. The Wiki page [1] doesn’t mention beginning or end of file, but says that " \A, ’ " matches the start of the matching string. Is there a way to make the whole file be the matched string?

      What I want to do is find all files that start with “<?php”. Is there a way to do that?

      Thank you.

      [1] http://docs.notepad-plus-plus.org/index.php/Regular_Expressions

      1 Reply Last reply Reply Quote 0
      • Terry RT
        Terry R
        last edited by

        I originally started using the info in the link you show, now I’ve moved onto a better website. This one explains in more detail, has better examples and I think has a more logical approach.
        http://www.rexegg.com/regex-quickstart.html
        Of course your website is only meant as a primer so what it does is good, but I quickly got beyond it’s abilities to show and guide.

        In the website link I provided look for the DOTALL concept. This might allow you to select (match) the entire content of the file.

        I started thinking that looking for the <?php should be simple but realised most regexs are designed to succeed, so they continue looking until the expression is true. What I think you need to research is a method to have it fail as soon as it moves 1 character into the file. Again the website I listed has examples on how the regex can be made to try and fail. So the concept would be (in plain english) look for the string <?php where there are absolutely no characters before, not even spaces or CR/LF.

        I will continue trying to come up with a regex, but thought you’d maybe like to know what I have considered thus far.

        Terry

        1 Reply Last reply Reply Quote 0
        • Terry RT
          Terry R
          last edited by

          I’ve been playing with this and my solution is
          Find: (?<!\x0A)^\x3C\x3Fphp

          The problem I had was that the < and ? are special characters and although I tried using the delimiter on them it didn’t work. Also the \R caused the same problem. In the end I used the hex code for these characters.

          So (?<!\x0A) means I do not want a line feed. This may need amending in your environment depending on the character code set used. I assumed it is a CR followed by a LF. x0A is the LF only.
          Then we want a ^, start of line
          Then we want the characters <?php

          You mentioned wanting to select the entire file contents if the characters you seek are at the immediate start, so to add to my regex we would then have
          Find: (?<!\x0A)^\x3C\x3Fphp(.*\R*)*

          This is only half the battle. Now I think if you looking across multiple files using some automatic process this regex needs incorporating somehow into that. Unfortunately that is outside my abilities, presently.

          Terry

          1 Reply Last reply Reply Quote 2
          • guy038G
            guy038
            last edited by

            Hello, @ennexmb, @terry-r and All,

            Indeed, in some cases, the \A assertion, standing for the beginning of file, when followed with a specific regex, may not match the location right before the very first character of the current file.

            However, in your case, @ennexmb, as you’re searching for a string <?php, made up of literal characters only, it should be OK !

            So if you scan of bunch of files, using the Find or the Find in Files dialog, , the regex \A<\?php should identify any [ PHP ] file, whose first line begins with the <?php string ;-))

            Just note that the question mark must be escaped , as \?, to be interpreted as a literal char !

            Best Regards,

            guy038

            1 Reply Last reply Reply Quote 1
            • EnnexMBE
              EnnexMB
              last edited by

              Oh geez, my stupidity!

              I just ran the search again using \A<\?php as suggested by guy038 and it does indeed work. I’m guessing that when I tired it before I neglected to escape the ?. Duh!

              Terry, thanks for your valiant efforts. I tried your search string as well and it also works. That’s some pretty hardcore hacking, which would have been a great solution had the \A actually not been working.

              And I’ve also learned from this not to trust the Notepad++ Wiki as documentation. I wanted to use that instead of a reference like the one suggested by Terry because it’s specific to Notepad++. Since there are different flavors of RegEx, I wanted a doc specific to the flavor in Notepad++. But it doesn’t help much if it’s wrong!

              Thanks Terry and guy for your help.

              1 Reply Last reply Reply Quote 2
              • Dennis HughesD
                Dennis Hughes
                last edited by

                I’ve played around a bit to find the absolute start of file in the current version of Notepad++ (7.8.6), I’ve found that ^\A works beautifully.

                Also, I don’t know if it’s documented anywhere, but another item I’ve found in Notepad++ is that \Z functions as ‘end of a blank line’ rather than the absolute end of file, whereas \z (lower case Z) acts to only detect the absolute end of the file.

                1 Reply Last reply Reply Quote 1
                • PeterJonesP
                  PeterJones
                  last edited by

                  @Dennis-Hughes

                  (7.8.6), I’ve found that ^\A works beautifully.

                  Have you seen the recent discussions with @guy038 and @Alan-Kilborn? There might be more tests to run described in there. (I cannot remember which topic, and forum searches don’t work to find punctuation. One of them should be able to link you to the more-detailed discussion of the \A issues)

                  Also, I don’t know if it’s documented anywhere

                  https://npp-user-manual.org/docs/searching/#anchors

                  is that \Z functions as ‘end of a blank line’ rather than the absolute end of file, whereas \z (lower case Z) acts to only detect the absolute end of the file.

                  Not quite.

                  You are right that \z only matches absolute end-of-file, or the end of the string if ☑ In selection is enabled.

                  However, \Z does not match ‘end of a blank line’. As the documentation says, it matches \z preceded by 0 or more blank lines. You can see this by pasting the below file into Notepad++ and searching for \Z: it will match in two locations: at the end of line H (which is \z preceded by a single newline), and the beginning of the empty line after (called I), which is \z without any newlines.

                  If your description were right, it would have matched on B, D, E, G, and I; but it does not.

                  A. Not blank
                  
                  C. Two (B) was blank
                  
                  
                  F. Four and five (D,E) were blank
                  
                  H. Next line (I) will be blank and end of file
                  
                  

                  1 Reply Last reply Reply Quote 2
                  • Alan KilbornA
                    Alan Kilborn
                    last edited by

                    @PeterJones said

                    Have you seen the recent discussions with @guy038 and @Alan-Kilborn? T

                    Either of these threads may be what @PeterJones is referencing?:

                    • https://community.notepad-plus-plus.org/topic/11987/add-line-of-text-to-beginning-of-multiple-files/
                    • https://community.notepad-plus-plus.org/topic/19033/first-line-replacement

                    These links may also contain related info:

                    • https://community.notepad-plus-plus.org/topic/16104/how-to-find-and-highlight-a-specific-occurance-of-a-symbol
                    • https://community.notepad-plus-plus.org/topic/18240/how-to-replace-or-delete-text-between-numbered-lines
                    1 Reply Last reply Reply Quote 1
                    • guy038G
                      guy038
                      last edited by guy038

                      Hi, @dennis-hughes @peterjones, @alan-kilborn, @terry-r and All,

                      Peter , when I first pasted your sample text, I stupidly forgot the EOL chars of line H, and obviously, I just got one match, using the \Z syntax !

                      So, to any people trying to reproduce @peterjones manipulations, beware that the line H must end with a line-break, in order to create an empty line I, without any EOL char !

                      @dennis-hughes :

                      To be short, any of these three syntaxes (?<!\n|\r|\f)^    ,    ^\A    ,    \A^

                      • is a work-around syntax to replace the buggy \A legal form !

                      • Correctly works, assuming that all your file(s) scanned does/do not begin with true empty line(s)

                      For the particular cases, simply refer to @alan-kilborn’s links, in the previous post

                      BR

                      guy038

                      1 Reply Last reply Reply Quote 2
                      • guy038G
                        guy038
                        last edited by guy038

                        Hi, All,

                        Do you know how I could maintain some spacing between the three regexes :

                        (?<!\n|\r|\f)^    ,    ^\A    ,    \A^

                        Which are usually rewritten :

                        (?<!\n|\r|\f)^ , ^\A , \A^

                        in the legal code text, of my previous post ?


                        Well, the trick is to use No Break Space character(s) ( \xA0 ), instead of the usual Space chars ( \x20 ) ;-))

                        So, use the Alt + 160 input method, from the numeric keypad, to insert a No Break Space character, at current cursor location !

                        Cheers,

                        guy038

                        1 Reply Last reply Reply Quote 1
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors