Changing text between square brackets

Patrick Cain

This question likely has a simple answer, but I know very little about Notepad. I have a number of large documents in .txt. I need to replace text (specifically names FIRST, LAST) appearing between square brackets, but no other text appearing between square brackets. As indicated, the names show as [FIRST, LAST] and sometimes [FIRST, LAST MI.] The text I wish to leave untouched appears uniformly as [WORD]. Thanks in advance for assisting a neophyte.

Scott Sumner

@Patrick-Cain

You don’t say what your replacement looks like, but as far as finding the matches you say you’re looking for, search this way:

Find what: \[\w+, \w+( \w\.)?\]
Search mode: Regular expression

The basic interpretation of this is:

Find a [
Followed by one or more “word” characters (which are A-Z, a-z, 0-9, and _) – you don’t need 0-9 and _ but likely no harm in it
Followed by a comma and space
Followed by one or more “word” chars
Followed by an optional space+word-char+period
Followed by a ]

Scott Sumner

So, like all things, this can get a bit complicated…

I was experimenting with named-groups with this for my own “fun” (trying to make regexes more readable) and came up with this possible replacement example:

Find what: (?-i)\[(?<first>[A-Z][a-z]+),(?<flsep>\x20|\r\n)(?<last>[A-Z][a-z]+)(?:(?<lmisep>\x20|\r\n)(?<mi>[A-Z])\.)?\]
Replace with: $+{last},$+{flsep}$+{first}(?{mi}$+{lmisep}$+{mi})
Search mode: Regular expression`

This will take [First, Last] or [First, Last M.] and convert it to Last, First or Last, First M respectively. It will work if a Windows line-ending (\r\n) occurs at a reasonable place inside the find string.

The key point for my “fun” was that a regex-grouping in the “find” part can be named via this syntax: (?<my_name>...) and can be used in the ‘replace’ part via $+{my_name} or tested-for in this manner: (?{my_name}...) (this “test-for” feature is used in the earlier replace-with expression to see if the optional middle-initial exists…and if so, what to insert into the replacement text if it does).

Sample input data:

Lorem ipsum dolor sit amet, [Vivan, Shurtliff] consectetur adipiscing elit.  Ut
blandit viverra diam luctus luctus.  In [Kirby, Heidt M.] tellus nunc, dapibus id
gravida vel, lacinia venenatis augue.  Nunc [Jessie, Mulford] sagittis rhoncus
hendrerit.  Sed vel augue nisi, vel sagittis sem.  [Taren, Fish] Aenean ante
diam, rutrum ut eleifend in, convallis sed est.  Class due anti [Rhett, Himes
P.] Pellentesque eu tempor et interdum quis, molestie commodo tempor et interdum
ante quis metus dictum feugiat.  Ut blandit volutpat [Harland, Hutzler] ante in
commodo.  Duis quam lorem, lacinia nec tempus non, [Lino, Bureau] tristique sed
turpis.  In id est mi.  Class aptent taciti [Ivana, Mechem Z.] sociosqu ad litora
torquent per conubia nostra, per inceptos himenaeos.  [James, Mcbride F.] Nunc
ipsum libero, tempor et interdum quis, molestie commodo mauris.  [Felecia,
Menendez] Fusce tempor, felis vel pellentesque luctus, enim lacus sagittis
arcu, [Bradly, Blackledge] at mollis tellus mauris in dui.  Nunc vel leo velit.
[Obdulia, Ocana] Aliquam sit amet erat sit amet elit consequat tempor.

Sample output data:

Lorem ipsum dolor sit amet, Shurtliff, Vivan consectetur adipiscing elit.  Ut
blandit viverra diam luctus luctus.  In Heidt, Kirby M tellus nunc, dapibus id
gravida vel, lacinia venenatis augue.  Nunc Mulford, Jessie sagittis rhoncus
hendrerit.  Sed vel augue nisi, vel sagittis sem.  Fish, Taren Aenean ante
diam, rutrum ut eleifend in, convallis sed est.  Class due anti Himes, Rhett
P Pellentesque eu tempor et interdum quis, molestie commodo tempor et interdum
ante quis metus dictum feugiat.  Ut blandit volutpat Hutzler, Harland ante in
commodo.  Duis quam lorem, lacinia nec tempus non, Bureau, Lino tristique sed
turpis.  In id est mi.  Class aptent taciti Mechem, Ivana Z sociosqu ad litora
torquent per conubia nostra, per inceptos himenaeos.  Mcbride, James F Nunc
ipsum libero, tempor et interdum quis, molestie commodo mauris.  Menendez,
Felecia Fusce tempor, felis vel pellentesque luctus, enim lacus sagittis
arcu, Blackledge, Bradly at mollis tellus mauris in dui.  Nunc vel leo velit.
Ocana, Obdulia Aliquam sit amet erat sit amet elit consequat tempor.

If anyone is still reading you get internet-points for endurance.

guy038

Hi, @scott-sumner and All,

Ah, yes, Scott, using named capturing groups is a solution for documented regexes. But there a nice other way to get correct regexes, with a lot of comments !

I tried to rewrite your S/R, with named groups, using the following template :

SEARCH :

(?x)
(?-i)      # The search is NON-insensitive ( => Sensitive ! )
\[         # A single opening square bracket ( ESCAPED as special char. )
(          # Beginning of group 1 ( First Name )
[A-Z]      # A single capital letter
[a-z]+     # A NON-null range of lower-case letters
)          # End of group 1
,          # A single comma character
(          # Beginning of group 2 ( FL separator )
\x20|\r\n  # A single space character OR the TWO Window End of Line characters
)          # End of group 2
(          # Beginning of group 3 ( Last Name )
[A-Z]      # A single capital letter
[a-z]+     # A NON-null range of lower-case letters
)          # End of group 3
(?:        # Beginning of an OPTIONAL, non-capturing, group
(          # Beginning of group 4 ( MI separator )
\x20|\r\n  # A single space character OR the TWO Window End of Line characters
)          # End of group 4
(          # Beginning of group 5 ( Middle Initial )
[A-Z]      # A single capital letter
)          # End of group 5
\.         # A single dot character ( ESCAPED as special char. )
)?         # End of the OPTIONAL group 5
\]         # A single ending square bracket ( ESCAPED as special char. )

Unfortunately, this way of writing does NOT work in the replacement part :

# The replacement part CANNOT be split in SEVERAL lines !!
#
# \3,      # Last name is written first, followed by a comma
# \2       # Then, we add the FL separator
# \1       # Then, the First name is written
# ?5       # And if group 5 ( Middle Initial ) exists :
# \4\5     #     We rewrite group 4 ( MI separator ), followed by group 5 ( Middle Initial )

=> REPLACEMENT :

\3,\2\1?5\4\5

Now :

Select all the lines of the SEARCH part, above, between (?x) and \]
Copy them, in the clipboard, with a Ctrl + C shortcut
Paste, first, this selection, in your current file, with a Ctrl + V shortcut
Re-select this text, representing the search part
Open the Replace dialog ( Ctrl + H )
Paste the correct replacement regex, above, in the Replace with: zone
Select the Regular expresion search mode
Click on the Replace All button

Et voilà !!

Notes :

Once the search part selected, DON’T copy this selection in the clipboard, for further pasting, in the Find what: zone, of the Replace dialog ! Simply, open the Replace dialog :-) : The selection will be filled in the Find what: zone, automatically :-)
The syntax (?x) syntax MUST begin the subsequent lines, of the regex. This modifier starts a free-spacing and comment way of writing regexes, with a # character, beginning the comment part
As, in this mode, the space character is simply ignored, if you search for a space character, you’ll have to use one of the three following syntaxes : \ , [ ] or \x20

Best Regards,

guy038