• Login
Community
  • Login

ReGex help removing data

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
7 Posts 4 Posters 510 Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • D
    Dev Petty
    last edited by Apr 1, 2025, 6:23 PM

    Hidetoshi Furuzawa
    Born: 1912, California
    Gardener
    Topaz, Tule Lake Apr 1942–19 Feb 1946
    Released to Sebastopol, California
    

    Here is how I would like that data to look (“after” data):

    Hidetoshi Furuzawa
    
    

    *each entry begins with Born: and ends with California. I need everything between, other than the names, removed

    T P 3 Replies Last reply Apr 1, 2025, 6:35 PM Reply Quote 0
    • T
      Terry R @Dev Petty
      last edited by Terry R Apr 1, 2025, 6:35 PM Apr 1, 2025, 6:35 PM

      @Dev-Petty

      A question about the before data. I see California is shown twice, and in both cases at the end of a line. Having that and other possible lines in between also containing the same word makes that an issue.

      So is it always the “Born” line and exactly 3 lines after?
      Or is there a blank line between “records”

      If you could show multiple before “records” in their entirety it might help in the solution.

      Terry

      1 Reply Last reply Reply Quote 1
      • P
        PeterJones @Dev Petty
        last edited by Apr 1, 2025, 6:35 PM

        @Dev-Petty ,

        each entry begins with Born: and ends with California

        FIND = (?s)Born:.*?California
        REPLACE = <leave empty>
        SEARCH MODE = Regular Expression
        REPLACE ALL

        If there are too many newlines after the replacement, you could add a \R before teh Born: or after the California in the regex.

        ----

        Useful References

        • Notepad++ Online User Manual: Searching/Regex
        • FAQ: Where to find other regular expressions (regex) documentation
        T 1 Reply Last reply Apr 1, 2025, 6:38 PM Reply Quote 0
        • G
          guy038
          last edited by guy038 Apr 1, 2025, 6:38 PM Apr 1, 2025, 6:35 PM

          Hello, @dev-petty and All,

          So, I suppose that this regex should work nicely :

          • SEARCH (?s-i)^Born:.+?California\R

          • REPLACE Leave EMPTY

          • Check the Regular expression search mode

          Best Regards,

          guy038

          1 Reply Last reply Reply Quote 0
          • P
            PeterJones @Dev Petty
            last edited by Apr 1, 2025, 6:38 PM

            @Dev-Petty ,

            When @Terry-R wrote “I see California is shown twice,” I realized I hadn’t noticed that. My answer (and, I think, @guy038’s) would stop at that first California, rather than going to the end.

            Like @Terry-R , I think it would be helpful if you showed a few more examples in the same set of data, so that we could see variations in things like the number of lines, or whether the replacement can ever stop on the same line that has Born: or whether it always has to end on a subsequent line to Born:.

            1 Reply Last reply Reply Quote 0
            • T
              Terry R @PeterJones
              last edited by Apr 1, 2025, 6:38 PM

              @PeterJones said in ReGex help removing data:

              FIND = (?s)Born:.*?California

              and @guy038 , I think both of your regexes, only pick the first line (Born). That was the reason for my questions.

              Terry

              1 Reply Last reply Reply Quote 2
              • G
                guy038
                last edited by guy038 Apr 2, 2025, 6:18 AM Apr 1, 2025, 6:55 PM

                Hi, @dev-petty, @peterjones, @terry-r and All,

                Yes I was too rapid, directly answering, without testing in N++. My bad !

                So one correct syntax could be :

                • SEARCH (?s-i)(?-s:^Born:.+).+?California\R

                • REPLACE Leave EMPTY

                • Check the Regular expression search mode


                What means this regex, except for the literal strings Born: and California ?

                • The first part (?s-i) are initial modifiers which apply to the whole regex :

                  • The (?s) syntax means that any . regex char, found in the regex, may represent any single character, including the line-break \r and/or \n.

                  • The (?-i) syntax means that the search is done in an sensitive way ( so not insensitive ! ). Thus it will find the words Born and California but not the words born and california or BORN and CALIFORNIA. If an insensitive search is needed just use the (?si) syntax.

                • The second part is (?-s:^Born:.+) which is a non-capturing group ( a group whose we do not need the contents, further on, in search and/or replacement !) (?:.........) with the -s modifier which applies to this group only. Thus, this part looks for the word Born, with that exact case, at the beginning of line ^, followed with a colon, itself followed with any standard character ., repeated +, till the very end of current line as it stops at the line-breaks.

                • The third part is .+? which represents the smallest ? range of any character ., including \r and \r, repeated +, until …

                • The fourth part California\R which represents the word California, with this exact case, followed by \R which stands for any kind of line-break ( \r\n for Windows files, \n for Unix files or \r for Mac files ).

                • In replacement, as its zone is empty, the entire 4 lines matched are simply deleted !

                BR

                BR

                guy038

                1 Reply Last reply Reply Quote 1
                2 out of 7
                • First post
                  2/7
                  Last post
                The Community of users of the Notepad++ text editor.
                Powered by NodeBB | Contributors