Community
    • Login

    How to remove characters () and text inside

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    7 Posts 4 Posters 3.0k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • DrewD
      Drew
      last edited by

      Hello Please tell me how to remove part of the text enclosed in brackets () together with these brackets INSIDE THE TAGS.
      For axample, we have: <tag>Text 1 (Text 2)</tag>
      I need to have:
      <tag>Text 1</tag>

      I found the expression \ (. * ) (Through regular expressions).
      But this replace what happens in all document.
      But i need to delete info (Text 2) inside the tag <tag>.
      Thank you.

      1 Reply Last reply Reply Quote 0
      • asvcA
        asvc
        last edited by

        A bit of a brute-force approach, but will do the trick:

        1. Split into 3 groups:
          (\<\w*\>.*)\(.*\)(\<\/\w*\>)
        2. Join two groups:
          \1\2
        1 Reply Last reply Reply Quote 1
        • DrewD
          Drew
          last edited by

          @asvc said in How to remove characters () and text inside:

          A bit of a brute-force approach, but will do the trick:

          Split into 3 groups:
          (<\w*>.)(.)(</\w*>)
          Join two groups:
          \1\2

          Sorry, but I didn’t understand.
          Should I find something like:
          (<tag>.)(.)(</tag>)
          And replace with:
          <tag>\1\2</tag>

          I didn’t understand what to write to find and replace.

          1 Reply Last reply Reply Quote 0
          • PeterJonesP
            PeterJones
            last edited by

            @Drew said in How to remove characters () and text inside:

            I didn’t understand what to write to find and replace.

            Literally the text that @asvc wrote. Well, almost. The plaintext-highlighted string in #1 was intended as the FIND, and the plaintext-highlighted string in #2 was the REPLACE. What was implied, but not said, was that you had to use regular expression mode.

            However, the FIND expression had a bug, in that \< does not mean a literal < in Notepad++'s regular expressions: it means “anchor to the beginning of a word”, and similarly for \> meaning “end of a word”, so it did not match your example text. As @guy038 has tried to teach me (and I occasionally remember), don’t over-escape your regex, either. A less-escaped regex which I tested on your example text is (<\w*>.*)\(.*\)(</\w*>) . With this regex, and the text

            <tag>Text 1 (Text 2)</tag>
            <another>Text 1 (Text 2)</another>
            <tag>Text 1 (Text 2)</tag>
            

            it will transform to

            <tag>Text 1 </tag>
            <another>Text 1 </another>
            <tag>Text 1 </tag>
            

            Note: @asvc made it extremely generic, in that it will match any tag – and my fix above will, too. If, instead, you really want it to only remove the (...) from a specific tag (we’ll assume tag), then use a FIND of (<tag>.*)\(.*\)(</tag>), and the same REPLACE from #2. With this regex and the example text I showed, it will transform to

            <tag>Text 1 </tag>
            <another>Text 1 (Text 2)</another>
            <tag>Text 1 </tag>
            

            showing that it only changed the contents of <tag>, rather than any tag.

            If you example data is not representative of your actual data, then these regular expressions will likely not work from you. Regexes often need to be changed depending on the context of the other nearby characters

            asvcA DrewD 2 Replies Last reply Reply Quote 3
            • asvcA
              asvc @PeterJones
              last edited by

              @PeterJones said in How to remove characters () and text inside:

              < does not mean a literal < in Notepad++'s regular expressions

              That is interesting! I was under impression it is a more or less standard PCRE, however reading the link you have provided:

              \< ⇒ This matches the start of a word using Scintilla’s definitions of words.
              \> ⇒ This matches the end of a word using Scintilla’s definition of words.

              Good to know.

              Alan KilbornA 1 Reply Last reply Reply Quote 1
              • Alan KilbornA
                Alan Kilborn @asvc
                last edited by

                @asvc said in How to remove characters () and text inside:

                Good to know.

                Yes, but be aware that there may be a bug with it in certain circumstances.
                See HERE.
                My extra notes on that indicate the bug is in the N++ regex engine and is not specific to Pythonscript (which is maybe what the link leads one to believe).
                Anyway, if you choose to use \< and \>, YMMV. :-)

                1 Reply Last reply Reply Quote 1
                • DrewD
                  Drew @PeterJones
                  last edited by Drew

                  @PeterJones

                  Thank you so much! Yes it works.
                  I chose to find: (<tag>.*)\(.*\)(</tag>)
                  And replace with: \1\2 or $1$2
                  Checkmark regular expressions

                  1 Reply Last reply Reply Quote 1
                  • First post
                    Last post
                  The Community of users of the Notepad++ text editor.
                  Powered by NodeBB | Contributors