Community
    • Login

    Search line without ending tag

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    8 Posts 3 Posters 958 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Alex MeschA
      Alex Mesch
      last edited by

      Hello.

      I’m very bad in regex, can someone tell me, how i can find lines with next problem.
      I have xml document with a lot of info for tax system. Some files generated with errors, closing tag goin to the next line, and i need to find this cases.
      And one more problem, there can be spaces or tabs, before beggining tag.
      Need to find cases like <TypePost>Main *and here closing tag going to the next line

            <Post>Manager</Post>
            <subdivision><Marketing</subdivision>
            <TypePost>Main
            </TypePost>
      
      EkopalypseE 1 Reply Last reply Reply Quote 1
      • EkopalypseE
        Ekopalypse @Alex Mesch
        last edited by Ekopalypse

        @Alex-Mesch

        the \R denotes line endings and \h stands for horizontal spaces which can be spaces or tabs.
        So this in mind you might consider
        find what:<TypePost>Main\R\h+
        replace with: <TypePost>Main

        Alex MeschA 1 Reply Last reply Reply Quote 3
        • Alex MeschA
          Alex Mesch @Ekopalypse
          last edited by

          @Ekopalypse TY, working.
          But, i’m sorry) I forgot one moment.
          What if here can be any tag and text?
          Something like this, need to find line UsersC and UUID and some other different names.

          <UsersC>21
          </UsersC>
          <UUID>be9a1528-9a/6/0/-4917-8857-12896a7693de</UUID>
          <Date>2020-01-20</Date>
          <UUID>7f8e38ab-ceba-45c5-ab34-834b61bad840
          </UUID>
          
          EkopalypseE 1 Reply Last reply Reply Quote 1
          • EkopalypseE
            Ekopalypse @Alex Mesch
            last edited by

            @Alex-Mesch

            if your data is consistent then something like this
            find what:<(\w+>)(.*)\R\h*(</\1)
            replace with:\1\2\3
            might do it.

            So we are looking for

            • a tag <(\w+>) (a less sign followed by any word followed by a greater sign
            • followed by any text (.*)
            • followed by a end of line char \R
            • followed by horizontal spaces \h*
            • followed by the start of a closing tag </ followed by what was found in the starting tag \1 -> (</\1)
            Alex MeschA 1 Reply Last reply Reply Quote 3
            • Alex MeschA
              Alex Mesch @Ekopalypse
              last edited by

              @Ekopalypse amazing, it’s working very well))
              Thank you very much and thank u for description of the process.

              1 Reply Last reply Reply Quote 2
              • guy038G
                guy038
                last edited by guy038

                Hello, @alex-mesch, @ekopalypse and All,

                A second possibility, derived from @ekopalypse’s solution, would be :

                • Open the Replace dialog ( Ctrl + H )

                • SEARCH <(\w+)>.*\K\R\h*(?=</\1>)

                • REPLACE Leave EMPTY

                • Now, choice :

                  • To tick the Wrap around option if you want to process the S/R on the whole file, from beginning to end

                  • To untick the Wrap around option to process the S/R, from current location to the end of the file

                  • To do a normal selection of text first and then, tick the In selection option

                • Select the Regular expression search mode

                • Click, exclusively on the Replace All option, whatever your choice !

                Notes :

                • Due to the \K syntax, inside this regex, the search process works correctly, but the “step by step” replacement, with the Replace button, is not functional :-(

                • The search regex looks for a line-break, possibly followed with some blank characters ( tabulation and/or space ), ONLY IF :

                  • It is preceded with <, then a name tag \w+, stored as group 1, because embedded in parentheses, then > and any subsequent character(s) .*, even 0, till the line-break

                  • It is followed with the same ending tag </...>, due to the positive look-ahead structure ?=</\1>) and the \1 syntax which represents the name tag

                • As the replacement zone is empty, the EOL, and the possible blank chars, are simply deleted !

                Best Regards,

                guy038

                Alex MeschA 1 Reply Last reply Reply Quote 2
                • Alex MeschA
                  Alex Mesch @guy038
                  last edited by

                  @guy038 thx)
                  Tomorrow I will study how it works) Very hard for my brain)

                  1 Reply Last reply Reply Quote 0
                  • guy038G
                    guy038
                    last edited by

                    Hi, @alex-mesch,

                    To begin with, click on the link, below :

                    https://community.notepad-plus-plus.org/topic/15765/faq-desk-where-to-find-regex-documentation

                    Cheers,

                    guy038

                    1 Reply Last reply Reply Quote 0
                    • First post
                      Last post
                    The Community of users of the Notepad++ text editor.
                    Powered by NodeBB | Contributors