Community
    • Login

    Regex Help to replace text except when it matches a string

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    5 Posts 3 Posters 419 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Alfred StreichA
      Alfred Streich
      last edited by

      I’m a beginner in using regex. I have a set of xml paras and I want to replace the paragraph and line breaks between paras with a single space unless the new para starts with an all caps name followed by a colon. The name could be two or more names all in caps. There could be dozens of instances of this in a given document.

      So for example, if I start with:

      <para>CARLYNN MAGLIANO SWEENEY: Thank you, Amy. And thank you everyone for joining us. As Amy mentioned, today we’re going to be talking about navigating your parental leave. My name also, Amy mentioned, is Carlynn Magliano Sweeney. And I’m the managing director of Preferred Transition Resources.</para>

      <para>PTR is a career coaching, counseling, and outplacement company based in New York City. And we work mainly in the legal sector. Karen, why don’t you tell everybody a little bit about yourself?</para>

      <para>KAREN RUBIN: Thank you. So I’m an executive coach. And I’ve been doing this for the past decade after doing a career pivot. I had a different career. And I actually off ramped in my career.</para>

      <para>And one of the reasons I love this type of work is this is exactly the kind of help that I wish I had had when I was having my kids. So I see that this is obviously a joyful time in your life but it can also be a nerve wracking time.</para>

      I want to end with:
      <para>CARLYNN MAGLIANO SWEENEY: Thank you, Amy. And thank you everyone for joining us. As Amy mentioned, today we’re going to be talking about navigating your parental leave. My name also, Amy mentioned, is Carlynn Magliano Sweeney. And I’m the managing director of Preferred Transition Resources. PTR is a career coaching, counseling, and outplacement company based in New York City. And we work mainly in the legal sector. Karen, why don’t you tell everybody a little bit about yourself?</para>

      <para>KAREN RUBIN: Thank you. So I’m an executive coach. And I’ve been doing this for the past decade after doing a career pivot. I had a different career. And I actually off ramped in my career. And one of the reasons I love this type of work is this is exactly the kind of help that I wish I had had when I was having my kids. So I see that this is obviously a joyful time in your life but it can also be a nerve wracking time.</para>

      I could find a way to find the para breaks “</para>[\r\n]+<para>” but I don’t know how to restrict the search and replace to only those that are not followed by an all caps name and a colon.

      I would appreciate any help in solving this.

      Thanks,
      Alfred

      Alan KilbornA 1 Reply Last reply Reply Quote 0
      • Alan KilbornA
        Alan Kilborn @Alfred Streich
        last edited by

        @Alfred-Streich

        I don’t know if it will get you ALL the way there, but this seems to work for your sample data, and could be used by you as a starting point for more complex cases:

        Find what box: (?-is)</para>\R+<para>([A-Z ]+?[a-z].*?)</para>
        Replace with box: \1</para> <—note the leading space
        Wrap around checkbox: ticked
        Search mode radiobutton: Regular expression
        Press the Replace All button

        Alfred StreichA 1 Reply Last reply Reply Quote 4
        • Alfred StreichA
          Alfred Streich
          last edited by

          @Alan-Kilborn said in Regex Help to replace text except when it matches a string:

          \1</para>

          Thank you so much. That gets me a lot closer. It fixes a lot of cases except when there are several paras in a row that need the paras removed. If I try to run the expression again it continues to ignore them.

          Here’s a later section of text where the paras are still in place, I don’t understand the syntax of regular expressions well enough to know where I would need to modify the expression and/or apply a second expression to account for these. The text below should just be two paras, with the second being the last line of the sample:

          <para>KAREN RUBIN: Yeah. Absolutely. And just honestly, even just having this playbook calms a lot of anxiety and stress that people feel like, oh, OK, I sort of know what I should be doing. So for what it’s worth, just having this put together, this structure can be really helpful for a lot of people.But the first thing is when you have a better sense of the cases and projects that you’re going to be working on, put together a list of everything that’s going to need to be covered. And also think about your recommendations for staffing. So you’re closest to it. And you’re best positioned to know who those people might be.</para>

          <para>So you want to be thinking about who’s got the bandwidth. Who has an affinity for this type of work, or maybe who would welcome the opportunity to stretch or get new ability? So like parental leave can be a great development opportunity for a more junior associate.And you also want to prioritize what needs to happen while you’re gone. Is there anything that could be put on hold? So once you’ve got this, and depending on if you have one partner or multiple partners, but once you get that agreement and buy in, they are the ones who should be presenting this coverage plan to your colleagues.</para>

          <para>And the reason that’s important is if you are telling someone like, here, can you cover this in my absence, then it almost feels like you’re asking for a personal favor and you’re not. So you want to get this plan blessed from above so the more senior people can provide the air cover. And your role is to manage the training.</para>

          <para>CARLYNN MAGLIANO SWEENEY: [INAUDIBLE].</para>

          1 Reply Last reply Reply Quote 0
          • Alfred StreichA
            Alfred Streich @Alan Kilborn
            last edited by

            @Alan-Kilborn

            Actually, I just figures out the missing item. After I run your expression, I just put the same expression in again and just add a space before the “\R” and it takes care of all the remaining lines. I cannot thank you enough for the assistance.

            1 Reply Last reply Reply Quote 3
            • guy038G
              guy038
              last edited by guy038

              Hello, @alfred-streich, @alan-kilborn and All,

              As, seemingly, all lines, which must be joined to their previous line, do not contain any colon symbol ( : ), an alternate syntax, to the Alan’s one, could be :

              SEARCH (?-i)</para>\h*\R+\h*<para>(?![^\r\n]+:)

              REPLACE \x20

              Notes :

              • First, the in-line modifier (?-i) forces a non-insensitive search process

              • The part \h*\R+\h* matches any range of horizontal blank char(s) ( Space and Tab ), even null, followed with a non null range of line-breaks ( \r\n, \n or \r ), itself followed with an other possible range of blank char(s)

              • The part (?![^\r\n]+:) is a negative look-ahead structure, which defines a necessary condition for the overall regex to match, although not part of the final match, and looks for a line with does not contain any colon character, after the literal string <para> till its line-break

              • Note that the [^\r\n]+ defines a non-null of characters, different of EOL chars. So, any char after <para> till the colon symbol : !

              • In replacement, the syntax \x20 is the hexadecimal representation of a space character and you may, as well, write a single space char in the Replace with: zone

              Best regards,

              guy038

              1 Reply Last reply Reply Quote 3
              • First post
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors