Community

    • Login
    • Search
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search

    How to remove characters () and text inside

    Help wanted · · · – – – · · ·
    4
    7
    1007
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Drew
      Drew last edited by

      Hello Please tell me how to remove part of the text enclosed in brackets () together with these brackets INSIDE THE TAGS.
      For axample, we have: <tag>Text 1 (Text 2)</tag>
      I need to have:
      <tag>Text 1</tag>

      I found the expression \ (. * ) (Through regular expressions).
      But this replace what happens in all document.
      But i need to delete info (Text 2) inside the tag <tag>.
      Thank you.

      1 Reply Last reply Reply Quote 0
      • asvc
        asvc last edited by

        A bit of a brute-force approach, but will do the trick:

        1. Split into 3 groups:
          (\<\w*\>.*)\(.*\)(\<\/\w*\>)
        2. Join two groups:
          \1\2
        1 Reply Last reply Reply Quote 1
        • Drew
          Drew last edited by

          @asvc said in How to remove characters () and text inside:

          A bit of a brute-force approach, but will do the trick:

          Split into 3 groups:
          (<\w*>.)(.)(</\w*>)
          Join two groups:
          \1\2

          Sorry, but I didn’t understand.
          Should I find something like:
          (<tag>.)(.)(</tag>)
          And replace with:
          <tag>\1\2</tag>

          I didn’t understand what to write to find and replace.

          1 Reply Last reply Reply Quote 0
          • PeterJones
            PeterJones last edited by

            @Drew said in How to remove characters () and text inside:

            I didn’t understand what to write to find and replace.

            Literally the text that @asvc wrote. Well, almost. The plaintext-highlighted string in #1 was intended as the FIND, and the plaintext-highlighted string in #2 was the REPLACE. What was implied, but not said, was that you had to use regular expression mode.

            However, the FIND expression had a bug, in that \< does not mean a literal < in Notepad++'s regular expressions: it means “anchor to the beginning of a word”, and similarly for \> meaning “end of a word”, so it did not match your example text. As @guy038 has tried to teach me (and I occasionally remember), don’t over-escape your regex, either. A less-escaped regex which I tested on your example text is (<\w*>.*)\(.*\)(</\w*>) . With this regex, and the text

            <tag>Text 1 (Text 2)</tag>
            <another>Text 1 (Text 2)</another>
            <tag>Text 1 (Text 2)</tag>
            

            it will transform to

            <tag>Text 1 </tag>
            <another>Text 1 </another>
            <tag>Text 1 </tag>
            

            Note: @asvc made it extremely generic, in that it will match any tag – and my fix above will, too. If, instead, you really want it to only remove the (...) from a specific tag (we’ll assume tag), then use a FIND of (<tag>.*)\(.*\)(</tag>), and the same REPLACE from #2. With this regex and the example text I showed, it will transform to

            <tag>Text 1 </tag>
            <another>Text 1 (Text 2)</another>
            <tag>Text 1 </tag>
            

            showing that it only changed the contents of <tag>, rather than any tag.

            If you example data is not representative of your actual data, then these regular expressions will likely not work from you. Regexes often need to be changed depending on the context of the other nearby characters

            asvc Drew 2 Replies Last reply Reply Quote 3
            • asvc
              asvc @PeterJones last edited by

              @PeterJones said in How to remove characters () and text inside:

              < does not mean a literal < in Notepad++'s regular expressions

              That is interesting! I was under impression it is a more or less standard PCRE, however reading the link you have provided:

              \< ⇒ This matches the start of a word using Scintilla’s definitions of words.
              \> ⇒ This matches the end of a word using Scintilla’s definition of words.

              Good to know.

              Alan Kilborn 1 Reply Last reply Reply Quote 1
              • Alan Kilborn
                Alan Kilborn @asvc last edited by

                @asvc said in How to remove characters () and text inside:

                Good to know.

                Yes, but be aware that there may be a bug with it in certain circumstances.
                See HERE.
                My extra notes on that indicate the bug is in the N++ regex engine and is not specific to Pythonscript (which is maybe what the link leads one to believe).
                Anyway, if you choose to use \< and \>, YMMV. :-)

                1 Reply Last reply Reply Quote 1
                • Drew
                  Drew @PeterJones last edited by Drew

                  @PeterJones

                  Thank you so much! Yes it works.
                  I chose to find: (<tag>.*)\(.*\)(</tag>)
                  And replace with: \1\2 or $1$2
                  Checkmark regular expressions

                  1 Reply Last reply Reply Quote 1
                  • First post
                    Last post
                  Copyright © 2014 NodeBB Forums | Contributors