Community
    • Login

    ReGex help removing data

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    7 Posts 4 Posters 150 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Dev PettyD
      Dev Petty
      last edited by

      Hidetoshi Furuzawa
      Born: 1912, California
      Gardener
      Topaz, Tule Lake Apr 1942–19 Feb 1946
      Released to Sebastopol, California
      

      Here is how I would like that data to look (“after” data):

      Hidetoshi Furuzawa
      
      

      *each entry begins with Born: and ends with California. I need everything between, other than the names, removed

      Terry RT PeterJonesP 3 Replies Last reply Reply Quote 0
      • Terry RT
        Terry R @Dev Petty
        last edited by Terry R

        @Dev-Petty

        A question about the before data. I see California is shown twice, and in both cases at the end of a line. Having that and other possible lines in between also containing the same word makes that an issue.

        So is it always the “Born” line and exactly 3 lines after?
        Or is there a blank line between “records”

        If you could show multiple before “records” in their entirety it might help in the solution.

        Terry

        1 Reply Last reply Reply Quote 1
        • PeterJonesP
          PeterJones @Dev Petty
          last edited by

          @Dev-Petty ,

          each entry begins with Born: and ends with California

          FIND = (?s)Born:.*?California
          REPLACE = <leave empty>
          SEARCH MODE = Regular Expression
          REPLACE ALL

          If there are too many newlines after the replacement, you could add a \R before teh Born: or after the California in the regex.

          ----

          Useful References

          • Notepad++ Online User Manual: Searching/Regex
          • FAQ: Where to find other regular expressions (regex) documentation
          Terry RT 1 Reply Last reply Reply Quote 0
          • guy038G
            guy038
            last edited by guy038

            Hello, @dev-petty and All,

            So, I suppose that this regex should work nicely :

            • SEARCH (?s-i)^Born:.+?California\R

            • REPLACE Leave EMPTY

            • Check the Regular expression search mode

            Best Regards,

            guy038

            1 Reply Last reply Reply Quote 0
            • PeterJonesP
              PeterJones @Dev Petty
              last edited by

              @Dev-Petty ,

              When @Terry-R wrote “I see California is shown twice,” I realized I hadn’t noticed that. My answer (and, I think, @guy038’s) would stop at that first California, rather than going to the end.

              Like @Terry-R , I think it would be helpful if you showed a few more examples in the same set of data, so that we could see variations in things like the number of lines, or whether the replacement can ever stop on the same line that has Born: or whether it always has to end on a subsequent line to Born:.

              1 Reply Last reply Reply Quote 0
              • Terry RT
                Terry R @PeterJones
                last edited by

                @PeterJones said in ReGex help removing data:

                FIND = (?s)Born:.*?California

                and @guy038 , I think both of your regexes, only pick the first line (Born). That was the reason for my questions.

                Terry

                1 Reply Last reply Reply Quote 2
                • guy038G
                  guy038
                  last edited by guy038

                  Hi, @dev-petty, @peterjones, @terry-r and All,

                  Yes I was too rapid, directly answering, without testing in N++. My bad !

                  So one correct syntax could be :

                  • SEARCH (?s-i)(?-s:^Born:.+).+?California\R

                  • REPLACE Leave EMPTY

                  • Check the Regular expression search mode


                  What means this regex, except for the literal strings Born: and California ?

                  • The first part (?s-i) are initial modifiers which apply to the whole regex :

                    • The (?s) syntax means that any . regex char, found in the regex, may represent any single character, including the line-break \r and/or \n.

                    • The (?-i) syntax means that the search is done in an sensitive way ( so not insensitive ! ). Thus it will find the words Born and California but not the words born and california or BORN and CALIFORNIA. If an insensitive search is needed just use the (?si) syntax.

                  • The second part is (?-s:^Born:.+) which is a non-capturing group ( a group whose we do not need the contents, further on, in search and/or replacement !) (?:.........) with the -s modifier which applies to this group only. Thus, this part looks for the word Born, with that exact case, at the beginning of line ^, followed with a colon, itself followed with any standard character ., repeated +, till the very end of current line as it stops at the line-breaks.

                  • The third part is .+? which represents the smallest ? range of any character ., including \r and \r, repeated +, until …

                  • The fourth part California\R which represents the word California, with this exact case, followed by \R which stands for any kind of line-break ( \r\n for Windows files, \n for Unix files or \r for Mac files ).

                  • In replacement, as its zone is empty, the entire 4 lines matched are simply deleted !

                  BR

                  BR

                  guy038

                  1 Reply Last reply Reply Quote 1
                  • First post
                    Last post
                  The Community of users of the Notepad++ text editor.
                  Powered by NodeBB | Contributors