• Login
Community
  • Login

HTML - Only save the heading and paragraphs text, remove images and links

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
4 Posts 3 Posters 624 Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • D
    Daniel Norin
    last edited by Dec 28, 2021, 12:17 PM

    Hi :) Pretty new to notepad++ but it feels amazing with functions.

    I wonder if it´s possible to use somekind of regexp to clean html code so the only contain text within the heading (<h> to </h>) and paragraphs (<p> to </p>)

    Removing everything else between images etc and also attributes like links, classes, ids etc within a tag?

    Example:-----------------------------raw version------------------------------------

    <p><img alt=“” class=“aligncenter size-large wp-image-129073” height=“658” loading=“lazy” sizes=“(max-width: 1170px) 100vw, 1170px” src=“https://331mrnu3ylm2k3db3s1xd1hg-wpengine.netdna-ssl.com/wp-content/uploads/2017/12/coffee-film-1170x658.jpg ” srcset=“https://331mrnu3ylm2k3db3s1xd1hg-wpengine.netdna-ssl.com/wp-content/uploads/2017/12/coffee-film-1170x658.jpg 1170w, https://331mrnu3ylm2k3db3s1xd1hg-wpengine.netdna-ssl.com/wp-content/uploads/2017/12/coffee-film-770x433.jpg 770w, https://331mrnu3ylm2k3db3s1xd1hg-wpengine.netdna-ssl.com/wp-content/uploads/2017/12/coffee-film-768x432.jpg 768w, https://331mrnu3ylm2k3db3s1xd1hg-wpengine.netdna-ssl.com/wp-content/uploads/2017/12/coffee-film-830x467.jpg 830w, https://331mrnu3ylm2k3db3s1xd1hg-wpengine.netdna-ssl.com/wp-content/uploads/2017/12/coffee-film.jpg 1920w” width=“1170”></p>
    <p>Earlier this month we unveiled the nominees for the Ninth Annual Sprudgie Awards. <a class=“addbackground” href=“https://sprudge.com/vote ”>Voting is now open</a> across a dozen categories, honoring the very best in coffee. Voting ends December 31st, 2017 at 11:59 PM.</p>
    <p>In this feature, we’re spotlighting the 2017 nominees for Best Coffee Film/Video, one of the tightest races in all Sprudgie Award categories. Past winners for Best Coffee Film/Video include the <em><a class=“addbackground” href=“https://www.netflix.com/title/80109415 ”>Gilmore Girls: A Year In The Life,</a> <a href=“https://www.facebook.com/BaristaFilm/ ”>Barista</a>, <a class=“addbackground” href=“https://sprudge.com/dunkin-love-everything-54551.html ”>Dunkin Love</a>, <a class=“addbackground” href=“https://sprudge.com/now-watching-hey-girl-guide-coffeeing-maria-hill.html ”>Hey Girl Guide To Coffeeing</a>, <a class=“addbackground” href=“http://comediansincarsgettingcoffee.com/ ”>Comedians in Cars Getting Coffee</a>, </em>and <em><a class=“addbackground” href=“http://info.stumptowncoffee.com/kenya-video/ ”>“Kenya”</a> (Stumptown Coffee Roasters).</em></p>
    <p>Let’s meet this year’s nominees!</p><!-- Either there are no banners, they are disabled or none qualified for this location! -->
    <h3 id=“rb-The-Young-and-the-Spoonless-by-Cafe-Imports”>The Young and the Spoonless by Cafe Imports</h3>
    <p><iframe allowfullscreen=“allowfullscreen” frameborder=“0” height=“675” loading=“lazy” mozallowfullscreen=“mozallowfullscreen” src=“https://player.vimeo.com/video/215119548 ” title=“Introducing: The Young and the Spoonless” webkitallowfullscreen=“webkitallowfullscreen” width=“1200”></iframe></p>
    <h3 id=“rb-I-Yelp-By-The-Way-by-Dapper-amp-Wise”>I Yelp By The Way by Dapper & Wise</h3>
    <blockquote class=“instagram-media” data-instgrm-captioned=“data-instgrm-captioned” data-instgrm-permalink=“https://www.instagram.com/p/BbNDSpaFT_L/ ” data-instgrm-version=“8” style=“background:#FFF; border:0; border-radius:3px; box-shadow:0 0 1px 0 rgba(0,0,0,0.5),0 1px 10px 0 rgba(0,0,0,0.15); margin: 1px; max-width:658px; padding:0; width:99.375%; width:-webkit-calc(100% - 2px); width:calc(100% - 2px);”>
    <div style=“padding:8px;”>
    <div style=“background:#F8F8F8; line-height:0; margin-top:40px; padding:28.10185185185185% 0; text-align:center; width:100%;”>
    <div style=“background:url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACwAAAAsCAMAAAApWqozAAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAMUExURczMzPf399fX1+bm5mzY9AMAAADiSURBVDjLvZXbEsMgCES5/P8/t9FuRVCRmU73JWlzosgSIIZURCjo/ad+EQJJB4Hv8BFt+IDpQoCx1wjOSBFhh2XssxEIYn3ulI/6MNReE07UIWJEv8UEOWDS88LY97kqyTliJKKtuYBbruAyVh5wOHiXmpi5we58Ek028czwyuQdLKPG1Bkb4NnM+VeAnfHqn1k4+GPT6uGQcvu2h2OVuIf/gWUFyy8OWEpdyZSa3aVCqpVoVvzZZ2VTnn2wU8qzVjDDetO90GSy9mVLqtgYSy231MxrY6I2gGqjrTY0L8fxCxfCBbhWrsYYAAAAAElFTkSuQmCC); display:block; height:44px; margin:0 auto -44px; position:relative; top:-22px; width:44px;”></div>
    </div>

    -----------------------------cleaned version------------------------------------

    <p>Earlier this month we unveiled the nominees for the Ninth Annual Sprudgie Awards.Voting is now open across a dozen categories, honoring the very best in coffee. Voting ends December 31st, 2017 at 11:59 PM.</p>
    <p>In this feature, we’re spotlighting the 2017 nominees for Best Coffee Film/Video, one of the tightest races in all Sprudgie Award categories. Past winners for Best Coffee Film/Video include the Gilmore Girls: A Year In The Life,Barista, Dunkin Love, Hey Girl Guide To Coffeeing, Comedians in Cars Getting Coffee, and “Kenya" (Stumptown Coffee Roasters).</em></p>
    <p>Let’s meet this year’s nominees!</p>
    <h3>The Young and the Spoonless by Cafe Imports</h3>
    <h3>I Yelp By The Way by Dapper & Wise</h3>

    A 1 Reply Last reply Dec 28, 2021, 11:42 PM Reply Quote 0
    • G
      guy038
      last edited by Dec 28, 2021, 1:03 PM

      Hello, @daniel-norin,

      Sorry, but I have to go away for at least 2 hours !

      As soon as possible, I will be able to give you a first version !

      Best Regards

      guy038

      1 Reply Last reply Reply Quote 0
      • G
        guy038
        last edited by guy038 Dec 29, 2021, 11:42 AM Dec 28, 2021, 3:36 PM

        Hello, @daniel-norin and All,

        This is a first version, which will surely require modifications later on !


        Firstly, we delete some unwanted parts in some lines that we wxant to keep :

        • Open the Replace dialog ( Ctrl + H )

        • SEARCH (?s-i)<a .+?>|</a>|</?em>

        • REPLACE Leave EMPTY

        • Tick the Wrap around option, only

        • Select the Regular expression search mode

        • Click on the Replace All button

        • Close the dialog ( Esc )


        Secondly, we mark some text which will be copied in the clipboard :

        • Open the Mark dialog ( Ctrl + M )

        SEARCH (?s-i)<h([1-6]).+>\K.+?(?=</h\1>)|<p>\K\w.+?(?=</p>)

        • Tick the Wrap around option, only

        • Select the Regular expression search mode

        • Click on the Mark All button

        • Click on the Copy Marked Text button

        • Close the dialog ( Esc )


        Now, select all the text with Ctrl + A and hit the Ctrl + V shortcut

        The expected cleaned text should appear :

        Earlier this month we unveiled the nominees for the Ninth Annual Sprudgie Awards. Voting is now open across a dozen categories, honoring the very best in coffee. Voting ends December 31st, 2017 at 11:59 PM.
        In this feature, we’re spotlighting the 2017 nominees for Best Coffee Film/Video, one of the tightest races in all Sprudgie Award categories. Past winners for Best Coffee Film/Video include the Gilmore Girls: A Year In The Life, Barista, Dunkin Love, Hey Girl Guide To Coffeeing, Comedians in Cars Getting Coffee, and "Kenya" (Stumptown Coffee Roasters).
        Let’s meet this year’s nominees!
        The Young and the Spoonless by Cafe Imports
        I Yelp By The Way by Dapper & Wise
        

        Best Regards,

        guy038

        1 Reply Last reply Reply Quote 1
        • A
          astrosofista @Daniel Norin
          last edited by Dec 28, 2021, 11:42 PM

          @daniel-norin, @guy038, all

          Just for the sake of variety, here is my take. It seems to work fine on input data, hope it can deal with other inputs.

          Note that it does not remove empty paragraphs, such as the first and sixth. Anyway, it is easy to delete them in a second step.

          Instructions are similar to the ones provided by @guy038. Open the Replace dialog (Ctrl + H) and type:

          Search: (?s)<(?!/?p>|/?h3>?).*?>| id=.*?(?=>)
          Replace: [leave empty]
          

          Put the caret at the very beginning of the document, select the Regular Expression mode and click on Replace or Replace All.

          Output:

          <p></p>
          <p>Earlier this month we unveiled the nominees for the Ninth Annual Sprudgie Awards. Voting is now open across a dozen categories, honoring the very best in coffee. Voting ends December 31st, 2017 at 11:59 PM.</p>
          <p>In this feature, we’re spotlighting the 2017 nominees for Best Coffee Film/Video, one of the tightest races in all Sprudgie Award categories. Past winners for Best Coffee Film/Video include the Gilmore Girls: A Year In The Life, Barista, Dunkin Love, Hey Girl Guide To Coffeeing, Comedians in Cars Getting Coffee, and “Kenya” (Stumptown Coffee Roasters).</p>
          <p>Let’s meet this year’s nominees!</p>
          <h3>The Young and the Spoonless by Cafe Imports</h3>
          <p></p>
          <h3>I Yelp By The Way by Dapper & Wise</h3>
          

          Stay healthy

          1 Reply Last reply Reply Quote 0
          1 out of 4
          • First post
            1/4
            Last post
          The Community of users of the Notepad++ text editor.
          Powered by NodeBB | Contributors