Search line without ending tag



  • Hello.

    I’m very bad in regex, can someone tell me, how i can find lines with next problem.
    I have xml document with a lot of info for tax system. Some files generated with errors, closing tag goin to the next line, and i need to find this cases.
    And one more problem, there can be spaces or tabs, before beggining tag.
    Need to find cases like <TypePost>Main *and here closing tag going to the next line

          <Post>Manager</Post>
          <subdivision><Marketing</subdivision>
          <TypePost>Main
          </TypePost>


  • @Alex-Mesch

    the \R denotes line endings and \h stands for horizontal spaces which can be spaces or tabs.
    So this in mind you might consider
    find what:<TypePost>Main\R\h+
    replace with: <TypePost>Main



  • @Ekopalypse TY, working.
    But, i’m sorry) I forgot one moment.
    What if here can be any tag and text?
    Something like this, need to find line UsersC and UUID and some other different names.

    <UsersC>21
    </UsersC>
    <UUID>be9a1528-9a/6/0/-4917-8857-12896a7693de</UUID>
    <Date>2020-01-20</Date>
    <UUID>7f8e38ab-ceba-45c5-ab34-834b61bad840
    </UUID>
    


  • @Alex-Mesch

    if your data is consistent then something like this
    find what:<(\w+>)(.*)\R\h*(</\1)
    replace with:\1\2\3
    might do it.

    So we are looking for

    • a tag <(\w+>) (a less sign followed by any word followed by a greater sign
    • followed by any text (.*)
    • followed by a end of line char \R
    • followed by horizontal spaces \h*
    • followed by the start of a closing tag </ followed by what was found in the starting tag \1 -> (</\1)


  • @Ekopalypse amazing, it’s working very well))
    Thank you very much and thank u for description of the process.



  • Hello, @alex-mesch, @ekopalypse and All,

    A second possibility, derived from @ekopalypse’s solution, would be :

    • Open the Replace dialog ( Ctrl + H )

    • SEARCH <(\w+)>.*\K\R\h*(?=</\1>)

    • REPLACE Leave EMPTY

    • Now, choice :

      • To tick the Wrap around option if you want to process the S/R on the whole file, from beginning to end

      • To untick the Wrap around option to process the S/R, from current location to the end of the file

      • To do a normal selection of text first and then, tick the In selection option

    • Select the Regular expression search mode

    • Click, exclusively on the Replace All option, whatever your choice !

    Notes :

    • Due to the \K syntax, inside this regex, the search process works correctly, but the “step by step” replacement, with the Replace button, is not functional :-(

    • The search regex looks for a line-break, possibly followed with some blank characters ( tabulation and/or space ), ONLY IF :

      • It is preceded with <, then a name tag \w+, stored as group 1, because embedded in parentheses, then > and any subsequent character(s) .*, even 0, till the line-break

      • It is followed with the same ending tag </...>, due to the positive look-ahead structure ?=</\1>) and the \1 syntax which represents the name tag

    • As the replacement zone is empty, the EOL, and the possible blank chars, are simply deleted !

    Best Regards,

    guy038



  • @guy038 thx)
    Tomorrow I will study how it works) Very hard for my brain)




Log in to reply