Community
    • Login

    fix html code with regex

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    2 Posts 2 Posters 319 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Giannis KatebakisG
      Giannis Katebakis
      last edited by PeterJones

      Hi,
      I have an old html website and some texts include linebreaks and whitespaces inside paragraphs. I want to clear these texts and join lines. Unfortunately we are talking for more than 200 pages, so it will take ages if I do it by hand.

      An example of a text with split lines is this:

            <p align="justify" >The captain 
                    and the crew were taking care of the last details when someone remembered 
                    that there was no ice in the boat, nor the necessary amount of beer. 
                    A volunteer was found and he rushed to buy all that was needed. 
                    Soon he was back with a big bag full of ice and a dozen beers and 
                    other refreshments.</p>
                  <p align="justify" >The captain decided that everything was ready
                    and we should board the boat and sail. Five minutes later we watched
                    the little harbour getting smaller and smaller. The trip to Ayiofarango
                  had just started.</p>
      

      Is there a way to find these lines and join them with regex? Your help will be valuable.

      Yannis

      —

      moderator added code markdown around text; please don’t forget to use the </> button to mark example text as “code” so that characters don’t get changed by the forum

      PeterJonesP 1 Reply Last reply Reply Quote 0
      • PeterJonesP
        PeterJones @Giannis Katebakis
        last edited by

        @Giannis-Katebakis ,

        FAQ
        => Generic Regular Expression (regex) Formulas
        => Replacing in a specific zone of text

        The “zone” you want starts with <p (BSR) and ends with </p> (ESR). You want to find newlines using FR=\R in that zone, and replace with a space character as the RR .

        Actually, since your lines are also indented, then FR should probably be \s+ – which would collapse one or more whitespace inside the tags (whether spaces or tabs or newlines or unicode whitespace) into a single space each).

        When I replace each of the bolded BSR/ESR/FR/RR with the values I listed, and did a Replace All, it converted

        <p align="justify" >The captain 
                      and the crew were taking care of the last details when someone remembered 
                      that there was no ice in the boat, nor the necessary amount of beer. 
                      A volunteer was found and he rushed to buy all that was needed. 
                      Soon he was back with a big bag full of ice and a dozen beers and 
                      other refreshments.</p>
                    <p align="justify" >The captain decided that everything was ready
                      and we should board the boat and sail. Five minutes later we watched
                      the little harbour getting smaller and smaller. The trip to Ayiofarango
                    had just started.</p>
        

        to

        <p align="justify" >The captain and the crew were taking care of the last details when someone remembered that there was no ice in the boat, nor the necessary amount of beer. A volunteer was found and he rushed to buy all that was needed. Soon he was back with a big bag full of ice and a dozen beers and other refreshments.</p>
                    <p align="justify" >The captain decided that everything was ready and we should board the boat and sail. Five minutes later we watched the little harbour getting smaller and smaller. The trip to Ayiofarango had just started.</p>
        
        1 Reply Last reply Reply Quote 3
        • First post
          Last post
        The Community of users of the Notepad++ text editor.
        Powered by NodeBB | Contributors