How do I replace only the first <br> tag across multiple files?



  • I have some blocks of text in multiple files. As an example, here is one such block (these blocks have different content in each file but all have the <br> tags):-

    <p class=MsoNormal style='margin-bottom:12.0pt;line-height:normal'><span
        style='font-size:13.5pt;font-family:"Verdana","sans-serif";mso-fareast-font-family:
        "Times New Roman";mso-bidi-font-family:"Times New Roman";color:black'>-Mucous
        membranes are inflamed and ulcerated, causing mucopurulent discharges and
        splinter-like pains. Violent pains like deeply sticking SPLINTER or sharp,
        shooting, like lightning, grinding or radiating, causing starting,
        extending down the back or legs. <br>
        <br>
        -Chilly when uncovered yet feels smothered if wrapped up. Craves fresh air.
        <br>
        <br>
        -Symptoms of INCOORDINATION, loss of control and want of balance everywhere
        mentally and physically. Trembling in affected parts. <br>
        <br>
        -Desire for SUGAR, sweets, salt; but sugar &lt;, it results in diarrhoea. <br>
        <br>
        -Warm blooded, one of the hottest remedies. <br>
        <br>
        -Headache - with coldness and trembling. <br>
        <br>
        -Congestive, with fullness and heaviness, with sense of expansion. <br>
        <br>
        -Habitual, gastric, ending in bilious vomiting. <br>
        <br>
        - &lt; from mental labour; &gt;pressure or tight bandaging (Apis, Puls). <br>
        <br>
        -Eyes</span></p>
    

    How do I replace only the first <br> tag across multiple files (that is, the first <br> tag in each file)?



  • @Scott-Nielson This RegEx replaces every <br> tag: (?:<br>|\G(?!^))(?:[^<br>]*)*?\K([^<br>]*)(?=(?:[^<br>]*)*?\1)



  • @Scott-Nielson How do I replace only the first <br> tag across multiple files (that is, the first <br> tag in each file) with </span></p> ?



  • This did not replace the needful either: (?i-s)^.*?\K(<br>)(?:.*\1)?



  • @Scott-Nielson said in How do I replace only the first <br> tag across multiple files?:

    (?i-s)^.?\K(<br>)(?:.\1)?

    The above RegEx, (?i-s)^.*?\K(<br>)(?:.*\1)? also replaces all the <br> tags instead of just the first <br> tag of each file (of multiple files)



  • @Scott-Nielson The RegEx, ^[^/]+\K<br> replaced the last <br> tag in each block of text in each file, instead of the first (that’s what I want - to replace the first such tag)



  • @Scott-Nielson

    Read the documents (https://npp-user-manual.org/docs/searching/). There is an anchor \A which anchors to the beginning of the document; and you already know \K. And you already know how to do lookaheads and lookbehinds (including one I gave you that “looks for 1 or more characters, as long as none of them are the start of SOMEMAGICSTRING”) – and if you use quantifiers, you don’t even need something that complicated. Think about the logic you want, combine it with what you know. Voila.

    Or search the forum for recent posts about “first line”, and find this post, which shows you how to anchor to the beginning of the document, though it’s use case is slightly different than yours

    In this case, the logic you want is "start at the beginning of the file; look for zero or more characters that come before the first <br> (so be non-greedy), restart the match, and match <br>.

    • Original thought with the fancy negative lookahead previously shown: FIND = (?s)\A((?!<br>).)*\K<br>
    • Simpler non-greedy: FIND = (?s)\A.*?\K<br>

    Both found the first (and only the first) occurrence of <br> in my file, and could be used in Find in Files

    Note: \A was fixed not too long ago, in v7.9.1; make sure you’re using v7.9.1 or newer

    ----

    Please note: This Community Forum is not a data transformation service; you should not expect to be able to always say “I have data like X and want it to look like Y” and have us do all the work for you. If you are new to the Forum, and new to regular expressions, we will often give help on the first one or two data-transformation questions, especially if they are well-asked and you show a willingness to learn; and we will point you to the documentation where you can learn how to do the data transformations for yourself in the future. But if you repeatedly ask us to do your work for you, you will find that the patience of usually-helpful Community members wears thin. The best way to learn regular expressions is by experimenting with them yourself, and getting a feel for how they work; having us spoon-feed you the answers without you putting in the effort doesn’t help you in the long term and is uninteresting and annoying for us.

    You are up to 6 regex questions in barely more than a month.



  • @PeterJones Thanks a lot man, you saved me a lot of time!
    Now, for the same block of text as posted right at the top, where I have this at the bottom/end:-

    <br>
        -Eyes</span></p>
    

    How do I find all of those, replace the <br> tag (the last <br> tag in that block) with something else, keep the text - in this case, “-Eyes (with the dash)” and add something after the </span></p>? I know that using a $1 in the “replace” field will keep the text - in this case, “-Eyes (with the dash)” and using *(?=</span></p>) at the end will keep the </span></p> string as it is, after the replacement.



  • @PeterJones said in How do I replace only the first <br> tag across multiple files?:

    You are up to 6 regex questions in barely more than a month.

    7 now!



  • @Alan-Kilborn Is there a limit on the number of questions one can ask here? I’m trying my best to find a solution and ask for a solution here only if I no longer find a solution!



  • @Scott-Nielson

    Did you even read Peter’s “Please note” section above?
    This forum is about Notepad++, not about how to transform data with regular expressions.
    Plus, you aren’t really asking questions about regular expressions.
    You’re basically saying “do my regular expression writing work for me”.
    As Peter attempted to explain, this is frowned upon here.
    The extent of regular expression questions here should be general questions, such as “Which regex engine does N++ use?” or of course any suspected bugs with something going on with a regex.
    Don’t become a regex “taker” (seen many times here).



  • @Alan-Kilborn or anyone else. I am doing Edit->Line Operations->Reverse line order, then in the RegEx mode,
    Find: ^[^/]+\K<br> to find the last (which is actually the first since the lines are reversed) <br> tag in each section and Replace: <br>zxcvb,
    Then Find: (?s)(?<=color:black'>)-(.*?) *(?=zxcvb) and Replace: qwerty$1xxx where qwerty and xxx are some HTML codes I wanna add but is there a RegEx that can help me do the needful without the Edit->Line Operations->Reverse line order (to find the first <br> tag in each section)?



  • @Scott-Nielson ,

    Search the forum for “find the last”. We’ve recently talked about how to do that.



  • Hello, @scott-nielson, @peterjones, @alan-kilborn and All,

    @scott-nielson, you should not need to reverse the lines order to achieve some replacements !

    So, could you, like your first sample posted, provide us, both :

    • An initial example of your code containing several consecutive blocks <p ••••••</p>

    • The same expected blocks, after the modifications needed

    We can agree to flag any HTML code, that is to be added, by the notation <CODE 1>, <CODE 2>, etc.

    Best Regards,

    guy038



  • @guy038 I see that you are called the ReEx Guru here. Thanks for agreeing to help.
    I want this block of text:-

    <p class=MsoNormal style='margin-bottom:12.0pt;line-height:normal'><span
      style='font-size:13.5pt;font-family:"Verdana","sans-serif";mso-fareast-font-family:
      "Times New Roman";mso-bidi-font-family:"Times New Roman";color:black;
      mso-ansi-language:EN-US'>-Self-conscious. SELFISH. Egotism. <br>
      <br>
      -Usually self-confident. Domineering. Anger over his mistakes. <br>
      <br>
      -No confidence</span></p>
      <p class=MsoNormal style='text-align:justify'><span style='font-size:13.5pt;
      line-height:115%;font-family:"Verdana","sans-serif";mso-fareast-font-family:
      "Times New Roman";mso-bidi-font-family:"Times New Roman";color:black;
      mso-ansi-language:EN-US'>-When carefully selected remedies fail to act
      especially in acute diseases it frequently rejuvenates powers of organism (in
      chronic disease - Psorinum). <br>
      <br>
      -Complaints that relapse</span></p>
    <p><span><br>
      <br>
      -blah</span></p>
      <p><span><br>
      <br>
      -bleeh</span></p>
      <p><span><br>
      <br>
      -bloh</span></p>
      <p><span><br>
      <br>
      -bluh</span></p>
    

    to become like:-

    <p class=MsoNormal style='margin-bottom:12.0pt;line-height:normal'><span
      style='font-size:13.5pt;font-family:"Verdana","sans-serif";mso-fareast-font-family:
      "Times New Roman";mso-bidi-font-family:"Times New Roman";color:black;
      mso-ansi-language:EN-US'>-Self-conscious. SELFISH. Egotism. <br>zxcvb
      <br>
      -Usually self-confident. Domineering. Anger over his mistakes. <br>
      <br>
      -No confidence</span></p>
      <p class=MsoNormal style='text-align:justify'><span style='font-size:13.5pt;
      line-height:115%;font-family:"Verdana","sans-serif";mso-fareast-font-family:
      "Times New Roman";mso-bidi-font-family:"Times New Roman";color:black;
      mso-ansi-language:EN-US'>-When carefully selected remedies fail to act
      especially in acute diseases it frequently rejuvenates powers of organism (in
      chronic disease - Psorinum). <br>zxcvb
      <br>
      -Complaints that relapse</span></p>
    <p><span><br>zxcvb
      <br>
      -blah</span></p>
      <p><span><br>zxcvb
      <br>
      -bleeh</span></p>
      <p><span><br>zxcvb
      <br>
      -bloh</span></p>
      <p><span><br>zxcvb
      <br>
      -bluh</span></p>
    

    If I use the RegEx, ^[^/]+\K<br> to find the last <br> tag in each block, between <p..........</p> and then replace it with, <br>qwerty which is what I put in the “replace” field, it replaces it with <br>qwerty, or rather adds the zxcvb text just after the last <br> tag in each block but I want to add the zxcvb text just after the first <br> tag in each block as seen in the example given above.
    Please help me!



  • @guy038 Sorry RegEx Guru, I want this block of text:-

    <p class=MsoNormal style='margin-bottom:12.0pt;line-height:normal'><span
      style='font-size:13.5pt;font-family:"Verdana","sans-serif";mso-fareast-font-family:
      "Times New Roman";mso-bidi-font-family:"Times New Roman";color:black;
      mso-ansi-language:EN-US'>-Self-conscious. SELFISH. Egotism. <br>
      <br>
      -Usually self-confident. Domineering. Anger over his mistakes. <br>
      <br>
      -No confidence</span></p>
      <p class=MsoNormal style='text-align:justify'><span style='font-size:13.5pt;
      line-height:115%;font-family:"Verdana","sans-serif";mso-fareast-font-family:
      "Times New Roman";mso-bidi-font-family:"Times New Roman";color:black;
      mso-ansi-language:EN-US'>-When carefully selected remedies fail to act
      especially in acute diseases it frequently rejuvenates powers of organism (in
      chronic disease - Psorinum). <br>
      <br>
      -Complaints that relapse</span></p>
    <p><span><br>
      <br>
      -blah</span></p>
      <p><span><br>
      <br>
      -bleeh</span></p>
      <p><span><br>
      <br>
      -bloh</span></p>
      <p><span><br>
      <br>
      -bluh</span></p>
    

    to become like:-

    <p class=MsoNormal style='margin-bottom:12.0pt;line-height:normal'><span
      style='font-size:13.5pt;font-family:"Verdana","sans-serif";mso-fareast-font-family:
      "Times New Roman";mso-bidi-font-family:"Times New Roman";color:black;
      mso-ansi-language:EN-US'>-Self-conscious. SELFISH. Egotism. zxcvb<br>
      <br>
      -Usually self-confident. Domineering. Anger over his mistakes. <br>
      <br>
      -No confidence</span></p>
      <p class=MsoNormal style='text-align:justify'><span style='font-size:13.5pt;
      line-height:115%;font-family:"Verdana","sans-serif";mso-fareast-font-family:
      "Times New Roman";mso-bidi-font-family:"Times New Roman";color:black;
      mso-ansi-language:EN-US'>-When carefully selected remedies fail to act
      especially in acute diseases it frequently rejuvenates powers of organism (in
      chronic disease - Psorinum). zxcvb<br>
      <br>
      -Complaints that relapse</span></p>
    <p><span>zxcvb<br>
      <br>
      -blah</span></p>
      <p><span>zxcvb<br>
      <br>
      -bleeh</span></p>
      <p><span>zxcvb<br>
      <br>
      -bloh</span></p>
      <p><span>zxcvb<br>
      <br>
      -bluh</span></p>
    

    that is, after the replacement, the text (in this case I’ve used zxcvb as an example) should be added before the <br> tag that is found (the first <br> tag in each block between <p.....</p>).



  • @Scott-Nielson said in How do I replace only the first <br> tag across multiple files?:

    I want this block of text:-

    Repeating your request isn’t going to get @guy038 to respond any faster. We here on the forum give our time freely, to help others. We aren’t tied to the forum 24hrs a day, we mostly have real jobs that pay.

    So just wait and since @guy038 has offered to help you he will, just allow him time to do so. Patience is a virtue!

    Terry



  • @Terry-R said in How do I replace only the first <br> tag across multiple files?:

    Repeating your request

    Technically, it wasn’t just a repeat. The text in the “replace with” changed from <br>zxcvb to zxcvb<br> in multiple locations. I think the OP said “sorry” because he got the data wrong.

    Unfortunately, due to the historic manner in which this user “asks” for things, it was not unreasonable for you to assume that the OP was just being impatient. That’s too bad, when a slight change of phrasing, and more of a willingness to do the work themselves, would make a drastic improvement in that user’s questions.

    @Scott-Nielson ,

    For the problem at hand, if Guy doesn’t have a chance to solve it for you right away, I highly recommend you should look at the generic regular expression for finding text between two keywords that Guy details here, which would solve the “for each section” part of the question; and maybe doing a search for “find last” in the forum, which would solve the "find the last <br>" part of the question. Combining the “find the last” with “for each section” would give you all that you need in order to figure it out on your own. And doing so would greatly benefit you, both in personal learning and showing us that you are willing to learn.

    I’ll give my generic advice again. Maybe you will read and understand it this time, and will start reading the pages linked

    ----

    Please note: This Community Forum is not a data transformation service; you should not expect to be able to always say “I have data like X and want it to look like Y” and have us do all the work for you. If you are new to the Forum, and new to regular expressions, we will often give help on the first one or two data-transformation questions, especially if they are well-asked and you show a willingness to learn; and we will point you to the documentation where you can learn how to do the data transformations for yourself in the future. But if you repeatedly ask us to do your work for you, you will find that the patience of usually-helpful Community members wears thin. The best way to learn regular expressions is by experimenting with them yourself, and getting a feel for how they work; having us spoon-feed you the answers without you putting in the effort doesn’t help you in the long term and is uninteresting and annoying for us.

    ----

    Do you want regex search/replace help? Then please be patient and polite, show some effort, and be willing to learn; answer questions and requests for clarification that are made of you. All example text should be marked as literal text using the </> toolbar button or manual Markdown syntax. To make regex in red (and so they keep their special characters like *), use backticks, like `^.*?blah.*?\z`. Screenshots can be pasted from the clipboard to your post using Ctrl+V to show graphical items, but any text should be included as literal text in your post so we can easily copy/paste your data. Show the data you have and the text you want to get from that data; include examples of things that should match and be transformed, and things that don’t match and should be left alone; show edge cases and make sure you examples are as varied as your real data. Show the regex you already tried, and why you thought it should work; tell us what’s wrong with what you do get. Read the official NPP Searching / Regex docs and the forum’s Regular Expression FAQ. If you follow these guidelines, you’re much more likely to get helpful replies that solve your problem in the shortest number of tries.



  • @PeterJones said in How do I replace only the first <br> tag across multiple files?:

    Technically, it wasn’t just a repeat. The text in the “replace with” changed from <br>zxcvb to zxcvb<br> in multiple locations. I think the OP said “sorry” because he got the data wrong.

    So apologies to @Scott-Nielson , I did check the “before” text which was exactly the same, so assumed the “after” would also be.

    @PeterJones we seem to have an influx of OPs whose posts are very sparsely populated. Meaning bugger all text! I’ve given up trying to intepret some of the posts, and I wonder why you still persist. Maybe you need another boilerplate answer (or expand the current one, which is getting bigger by the day?) to say “please expand on your question or else you will get limited/no responses”.

    Terry



  • Hi, @scott-nielson, @peterjones, @alan-kilborn, @terry-r and All,

    Why not this S/R :

    SEARCH (?s-i)<p.*?><span.+?\K<br>

    REPLACE zxcvb$0

    As usual, Regular expression mode, Wrap around option and just 1 click on the Replace All button ( Do not use the Replace button, due to the \K syntax ! )


    Notes :

    • In SEARCH :

      • First, the (?s-i) syntax are in-line modifiers which means that :

        • The regex dot char . matches any single character, even EOL ones, allowing multi-lines search

        • The search is sensible to case ( not insensible ! )

      • The .*? and .+? syntaxes represent, respectively, the shortest, possibly null, range of chars between the <p and ><span strings and the shortest non-null range of chars between the ><span and first <br> strings

      • Then, the \K feature resets the present match. So, the final match is just the <br> tag

    • In REPLACEMENT :

      • The $0 syntax represents the overall match, so the string <br>, with this exact case

    Best Regards,

    guy038


Log in to reply