Can I find/replace instances where paragraph tags are followed by a lowercase letter?

  • Is there a way to do this specific find and replace?

    I’ve used a website to remove hard line breaks from old docs that were broken into 70 characters per line, which converts them into html as it removes the line breaks. It works fairly well, but there are times when it adds a new paragraph when it shouldn’t have, mid-sentence, and it’s taking a fair amount of time to remove them manually…

    So is there a way to replace instances of </p> <p>firstletterislowercase with space firstletterislowercase?

    And if I can’t do a search and replace like that, can I at least design a find that will show me all instances when <p> is followed directly by a lowercase letter so I don’t need to read the whole document to find each one?

    Thank you!

  • @theevilwriter

    Try this:

    Find what zone: (?-i)</p> <p>([a-z])
    Replace with zone: \1
    Search mode: Regular expression

    Note that there is a space between the tags because YOU had one there in your data (it may or may not be there in reality), and also note that there is a space before the \1 in the replace string (that one is hard to see). Could also use \x20\1 as the replace string which makes it way more obvious.

  • It says that it doesn’t find any instances, even with the cursor placed on the line above an example like:


    <p>gave him one of her brilliant grins and wandered into the kitchen to correct his faux-pas.</p>

    I tried doing a find and replace so instead of being on different lines the paragraph tags were one after another without spaces (</p><p>) and it still couldn’t find any.

  • I think it’s working now. For some reason, it can find </p> <p> but not </p><p> regardless of whether the replace term has a space.

    Anyway, thanks!!

  • @theevilwriter ,

    You do realize that @Scott-Sumner explicitly said “Note that there is a space between the tags because YOU had one there in your data (it may or may not be there in reality)”. He told you that it would only work with a space between the </p> and the <p>. I am not sure why you were surprised.

    And you said, “it can find … regardless of whether the replace term has a space”: why would you think that the replace term would affect what text the find matches?

    If you want to learn how to understand the regular expression syntax that @Scott-Sumner used, please see this FAQ. By studying the documentation, you could probably learn how to customize it to meet your changing requirements. If you still cannot figure out how to do exactly what you want, show what you tried, why you tried that (ie, why you chose the modifications you did), what you got, and what you expected, and we’ll probably be able to help you more. If you don’t show any additional effort, no amount of giving you new regexes will truly help you.

  • @PeterJones said:

    You do realize…

    LOL … amazing… (sigh)

Log in to reply