• Login
Community
  • Login

REGEX - Select only this part from text

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
7 Posts 3 Posters 6.0k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • V
    Vasile Caraus
    last edited by Oct 5, 2016, 11:22 AM

    hello. I want to match only this part from text:

    word
    word
    this_part
    this_part
    this_part
    this_part
    word
    word

    Any idea?

    1 Reply Last reply Reply Quote 0
    • J
      John Doge
      last edited by Oct 6, 2016, 10:51 AM

      Hi, it’s not entirely clear what the actual data is, but (this_part\r?\n)+ works on your example.

      1 Reply Last reply Reply Quote 0
      • V
        Vasile Caraus
        last edited by Oct 6, 2016, 8:32 PM

        thanks John, works. But, in the case I have this:

        word
        word
        this_part_1
        this_part_2
        this_part_3
        this_part_4
        word
        word

        1 Reply Last reply Reply Quote 0
        • J
          John Doge
          last edited by Oct 7, 2016, 9:40 AM

          If actual numbers are expected it’s (this_part_\d+\r?\n)+
          For this_part followed by anything it’s (this_part.*\r?\n)+ with “. matches newline” switched off.

          1 Reply Last reply Reply Quote 0
          • V
            Vasile Caraus
            last edited by Oct 11, 2016, 1:02 PM

            :) ok, but “this_part” can be “bla bla”. Anyway, if I have this case, what will be the regex?

            1.this is you
            2.Home alone
            3.My name is Prince.
            4.Mom goes home.
            5.I make cookies.
            6.Love is my name.
            7.sincerity and weekness
            8.word and emotions

            1 Reply Last reply Reply Quote 0
            • P
              PeterJones
              last edited by Oct 11, 2016, 1:39 PM

              Your requirements appear to keep changing, from our perspective. You know exactly what you mean, but you keep describing it in different ways, which produce very different answers.

              Here is my current interpretation of your requirements. For a file with arbitrary "blah"s that you want to ignore, using an arbitrary phrase “START” as the first line of the match and “FINISH” as the last line of the match (inclusive), having arbitrary data like:

              blah
              halb
              START
              some more here
              FINISH
              o-bla-dee
              o-bla-dah
              

              I was able to make that work as

              • Find what : .*(START.*FINISH).*
              • Replace with : $1
              • ☑ Regular Expression ☑ . matches newline
              • Click [ Replace All ]

              Given the input file above, my output file looked like

              START
              some more here
              FINISH
              

              But this is all just a guess as to what you want. If you really want something more like “everything between each START and FINISH, inclusive, no matter how many times START and FINISH occur”, I know it can be done… but my regex-foo isn’t strong enough. Ask @guy038, who is a regex guru extraordinaire.

              blah
              blady
              START
              some more here
              FINISH
              bladeee
              blah
              START
              again with the keeping
              FINISH
              again with the ignoring
              

              A hint when asking questions: if you don’t want people to interpret your example data literally – especially when asking for help with regular expressions – you need to plainly indicate it’s not literal, and you need to explain what can and cannot be in your non-literal data. Starting with asking for “this_part” for every line of the example data, then only tweaking your requirement to “this_part_#” strongly implied that your data-to-include all followed a very simple pattern.

              You should probably also answer some other questions: do you want to ignore leading and/or trailing space on the START and FINISH matching lines (should both START and ____START____ match, assuming space ’ ', not underline ‘_’?); is your phrase partial (ie, should both “FINISH” and “FINISH NOW” match – and should the “NOW” be part of the match, or should it be discarded)? Give as many details about what should and should not match as you can.

              1 Reply Last reply Reply Quote 1
              • V
                Vasile Caraus
                last edited by Oct 11, 2016, 5:46 PM

                Peter, looks GREAT ! Your first solution is very good !

                Guy38 already write the second solution on one of my first posts.

                THANK YOU Both !

                1 Reply Last reply Reply Quote 0
                2 out of 7
                • First post
                  2/7
                  Last post
                The Community of users of the Notepad++ text editor.
                Powered by NodeBB | Contributors