ReGex help removing data
-
Hidetoshi Furuzawa Born: 1912, California Gardener Topaz, Tule Lake Apr 1942–19 Feb 1946 Released to Sebastopol, California
Here is how I would like that data to look (“after” data):
Hidetoshi Furuzawa
*each entry begins with Born: and ends with California. I need everything between, other than the names, removed
-
A question about the before data. I see California is shown twice, and in both cases at the end of a line. Having that and other possible lines in between also containing the same word makes that an issue.
So is it always the “Born” line and exactly 3 lines after?
Or is there a blank line between “records”If you could show multiple before “records” in their entirety it might help in the solution.
Terry
-
each entry begins with Born: and ends with California
FIND =
(?s)Born:.*?California
REPLACE = <leave empty>
SEARCH MODE = Regular Expression
REPLACE ALLIf there are too many newlines after the replacement, you could add a
\R
before tehBorn:
or after theCalifornia
in the regex.----
Useful References
-
Hello, @dev-petty and All,
So, I suppose that this regex should work nicely :
-
SEARCH
(?s-i)^Born:.+?California\R
-
REPLACE
Leave EMPTY
-
Check the
Regular expression
search mode
Best Regards,
guy038
-
-
When @Terry-R wrote “I see California is shown twice,” I realized I hadn’t noticed that. My answer (and, I think, @guy038’s) would stop at that first
California
, rather than going to the end.Like @Terry-R , I think it would be helpful if you showed a few more examples in the same set of data, so that we could see variations in things like the number of lines, or whether the replacement can ever stop on the same line that has
Born:
or whether it always has to end on a subsequent line toBorn:
. -
@PeterJones said in ReGex help removing data:
FIND = (?s)Born:.*?California
and @guy038 , I think both of your regexes, only pick the first line (Born). That was the reason for my questions.
Terry
-
Hi, @dev-petty, @peterjones, @terry-r and All,
Yes I was too rapid, directly answering, without testing in N++. My bad !
So one correct syntax could be :
-
SEARCH
(?s-i)(?-s:^Born:.+).+?California\R
-
REPLACE
Leave EMPTY
-
Check the
Regular expression
search mode
What means this regex, except for the literal strings
Born:
andCalifornia
?-
The first part
(?s-i)
are initial modifiers which apply to the whole regex :-
The
(?s)
syntax means that any.
regex char, found in the regex, may represent any single character, including the line-break\r
and/or\n
. -
The
(?-i)
syntax means that the search is done in an sensitive way ( so not insensitive ! ). Thus it will find the wordsBorn
andCalifornia
but not the wordsborn
andcalifornia
orBORN
andCALIFORNIA
. If an insensitive search is needed just use the(?si)
syntax.
-
-
The second part is
(?-s:^Born:.+)
which is a non-capturing group ( a group whose we do not need the contents, further on, in search and/or replacement !)(?:.........)
with the-s
modifier which applies to this group only. Thus, this part looks for the wordBorn
, with that exact case, at the beginning of line^
, followed with a colon, itself followed with any standard character.
, repeated+
, till the very end of current line as it stops at the line-breaks. -
The third part is
.+?
which represents the smallest?
range of any character.
, including\r
and\r
, repeated+
, until … -
The fourth part
California\R
which represents the wordCalifornia
, with this exact case, followed by\R
which stands for any kind of line-break (\r\n
for Windows files,\n
for Unix files or\r
for Mac files ). -
In replacement, as its zone is
empty
, the entire4
lines matched are simply deleted !
BR
BR
guy038
-