Community

    • Login
    • Search
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search

    HTML - Only save the heading and paragraphs text, remove images and links

    Help wanted · · · – – – · · ·
    3
    4
    200
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Daniel Norin
      Daniel Norin last edited by

      Hi :) Pretty new to notepad++ but it feels amazing with functions.

      I wonder if it´s possible to use somekind of regexp to clean html code so the only contain text within the heading (<h> to </h>) and paragraphs (<p> to </p>)

      Removing everything else between images etc and also attributes like links, classes, ids etc within a tag?

      Example:-----------------------------raw version------------------------------------

      <p><img alt=“” class=“aligncenter size-large wp-image-129073” height=“658” loading=“lazy” sizes=“(max-width: 1170px) 100vw, 1170px” src=“https://331mrnu3ylm2k3db3s1xd1hg-wpengine.netdna-ssl.com/wp-content/uploads/2017/12/coffee-film-1170x658.jpg” srcset=“https://331mrnu3ylm2k3db3s1xd1hg-wpengine.netdna-ssl.com/wp-content/uploads/2017/12/coffee-film-1170x658.jpg 1170w, https://331mrnu3ylm2k3db3s1xd1hg-wpengine.netdna-ssl.com/wp-content/uploads/2017/12/coffee-film-770x433.jpg 770w, https://331mrnu3ylm2k3db3s1xd1hg-wpengine.netdna-ssl.com/wp-content/uploads/2017/12/coffee-film-768x432.jpg 768w, https://331mrnu3ylm2k3db3s1xd1hg-wpengine.netdna-ssl.com/wp-content/uploads/2017/12/coffee-film-830x467.jpg 830w, https://331mrnu3ylm2k3db3s1xd1hg-wpengine.netdna-ssl.com/wp-content/uploads/2017/12/coffee-film.jpg 1920w” width=“1170”></p>
      <p>Earlier this month we unveiled the nominees for the Ninth Annual Sprudgie Awards. <a class=“addbackground” href=“https://sprudge.com/vote”>Voting is now open</a> across a dozen categories, honoring the very best in coffee. Voting ends December 31st, 2017 at 11:59 PM.</p>
      <p>In this feature, we’re spotlighting the 2017 nominees for Best Coffee Film/Video, one of the tightest races in all Sprudgie Award categories. Past winners for Best Coffee Film/Video include the <em><a class=“addbackground” href=“https://www.netflix.com/title/80109415”>Gilmore Girls: A Year In The Life,</a> <a href=“https://www.facebook.com/BaristaFilm/”>Barista</a>, <a class=“addbackground” href=“https://sprudge.com/dunkin-love-everything-54551.html”>Dunkin Love</a>, <a class=“addbackground” href=“https://sprudge.com/now-watching-hey-girl-guide-coffeeing-maria-hill.html”>Hey Girl Guide To Coffeeing</a>, <a class=“addbackground” href=“http://comediansincarsgettingcoffee.com/”>Comedians in Cars Getting Coffee</a>, </em>and <em><a class=“addbackground” href=“http://info.stumptowncoffee.com/kenya-video/”>“Kenya”</a> (Stumptown Coffee Roasters).</em></p>
      <p>Let’s meet this year’s nominees!</p><!-- Either there are no banners, they are disabled or none qualified for this location! -->
      <h3 id=“rb-The-Young-and-the-Spoonless-by-Cafe-Imports”>The Young and the Spoonless by Cafe Imports</h3>
      <p><iframe allowfullscreen=“allowfullscreen” frameborder=“0” height=“675” loading=“lazy” mozallowfullscreen=“mozallowfullscreen” src=“https://player.vimeo.com/video/215119548” title=“Introducing: The Young and the Spoonless” webkitallowfullscreen=“webkitallowfullscreen” width=“1200”></iframe></p>
      <h3 id=“rb-I-Yelp-By-The-Way-by-Dapper-amp-Wise”>I Yelp By The Way by Dapper & Wise</h3>
      <blockquote class=“instagram-media” data-instgrm-captioned=“data-instgrm-captioned” data-instgrm-permalink=“https://www.instagram.com/p/BbNDSpaFT_L/” data-instgrm-version=“8” style=“background:#FFF; border:0; border-radius:3px; box-shadow:0 0 1px 0 rgba(0,0,0,0.5),0 1px 10px 0 rgba(0,0,0,0.15); margin: 1px; max-width:658px; padding:0; width:99.375%; width:-webkit-calc(100% - 2px); width:calc(100% - 2px);”>
      <div style=“padding:8px;”>
      <div style=“background:#F8F8F8; line-height:0; margin-top:40px; padding:28.10185185185185% 0; text-align:center; width:100%;”>
      <div style=“background:url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACwAAAAsCAMAAAApWqozAAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAMUExURczMzPf399fX1+bm5mzY9AMAAADiSURBVDjLvZXbEsMgCES5/P8/t9FuRVCRmU73JWlzosgSIIZURCjo/ad+EQJJB4Hv8BFt+IDpQoCx1wjOSBFhh2XssxEIYn3ulI/6MNReE07UIWJEv8UEOWDS88LY97kqyTliJKKtuYBbruAyVh5wOHiXmpi5we58Ek028czwyuQdLKPG1Bkb4NnM+VeAnfHqn1k4+GPT6uGQcvu2h2OVuIf/gWUFyy8OWEpdyZSa3aVCqpVoVvzZZ2VTnn2wU8qzVjDDetO90GSy9mVLqtgYSy231MxrY6I2gGqjrTY0L8fxCxfCBbhWrsYYAAAAAElFTkSuQmCC); display:block; height:44px; margin:0 auto -44px; position:relative; top:-22px; width:44px;”></div>
      </div>

      -----------------------------cleaned version------------------------------------

      <p>Earlier this month we unveiled the nominees for the Ninth Annual Sprudgie Awards.Voting is now open across a dozen categories, honoring the very best in coffee. Voting ends December 31st, 2017 at 11:59 PM.</p>
      <p>In this feature, we’re spotlighting the 2017 nominees for Best Coffee Film/Video, one of the tightest races in all Sprudgie Award categories. Past winners for Best Coffee Film/Video include the Gilmore Girls: A Year In The Life,Barista, Dunkin Love, Hey Girl Guide To Coffeeing, Comedians in Cars Getting Coffee, and “Kenya" (Stumptown Coffee Roasters).</em></p>
      <p>Let’s meet this year’s nominees!</p>
      <h3>The Young and the Spoonless by Cafe Imports</h3>
      <h3>I Yelp By The Way by Dapper & Wise</h3>

      astrosofista 1 Reply Last reply Reply Quote 0
      • guy038
        guy038 last edited by

        Hello, @daniel-norin,

        Sorry, but I have to go away for at least 2 hours !

        As soon as possible, I will be able to give you a first version !

        Best Regards

        guy038

        1 Reply Last reply Reply Quote 0
        • guy038
          guy038 last edited by guy038

          Hello, @daniel-norin and All,

          This is a first version, which will surely require modifications later on !


          Firstly, we delete some unwanted parts in some lines that we wxant to keep :

          • Open the Replace dialog ( Ctrl + H )

          • SEARCH (?s-i)<a .+?>|</a>|</?em>

          • REPLACE Leave EMPTY

          • Tick the Wrap around option, only

          • Select the Regular expression search mode

          • Click on the Replace All button

          • Close the dialog ( Esc )


          Secondly, we mark some text which will be copied in the clipboard :

          • Open the Mark dialog ( Ctrl + M )

          SEARCH (?s-i)<h([1-6]).+>\K.+?(?=</h\1>)|<p>\K\w.+?(?=</p>)

          • Tick the Wrap around option, only

          • Select the Regular expression search mode

          • Click on the Mark All button

          • Click on the Copy Marked Text button

          • Close the dialog ( Esc )


          Now, select all the text with Ctrl + A and hit the Ctrl + V shortcut

          The expected cleaned text should appear :

          Earlier this month we unveiled the nominees for the Ninth Annual Sprudgie Awards. Voting is now open across a dozen categories, honoring the very best in coffee. Voting ends December 31st, 2017 at 11:59 PM.
          In this feature, we’re spotlighting the 2017 nominees for Best Coffee Film/Video, one of the tightest races in all Sprudgie Award categories. Past winners for Best Coffee Film/Video include the Gilmore Girls: A Year In The Life, Barista, Dunkin Love, Hey Girl Guide To Coffeeing, Comedians in Cars Getting Coffee, and "Kenya" (Stumptown Coffee Roasters).
          Let’s meet this year’s nominees!
          The Young and the Spoonless by Cafe Imports
          I Yelp By The Way by Dapper & Wise
          

          Best Regards,

          guy038

          1 Reply Last reply Reply Quote 1
          • astrosofista
            astrosofista @Daniel Norin last edited by

            @daniel-norin, @guy038, all

            Just for the sake of variety, here is my take. It seems to work fine on input data, hope it can deal with other inputs.

            Note that it does not remove empty paragraphs, such as the first and sixth. Anyway, it is easy to delete them in a second step.

            Instructions are similar to the ones provided by @guy038. Open the Replace dialog (Ctrl + H) and type:

            Search: (?s)<(?!/?p>|/?h3>?).*?>| id=.*?(?=>)
            Replace: [leave empty]
            

            Put the caret at the very beginning of the document, select the Regular Expression mode and click on Replace or Replace All.

            Output:

            <p></p>
            <p>Earlier this month we unveiled the nominees for the Ninth Annual Sprudgie Awards. Voting is now open across a dozen categories, honoring the very best in coffee. Voting ends December 31st, 2017 at 11:59 PM.</p>
            <p>In this feature, we’re spotlighting the 2017 nominees for Best Coffee Film/Video, one of the tightest races in all Sprudgie Award categories. Past winners for Best Coffee Film/Video include the Gilmore Girls: A Year In The Life, Barista, Dunkin Love, Hey Girl Guide To Coffeeing, Comedians in Cars Getting Coffee, and “Kenya” (Stumptown Coffee Roasters).</p>
            <p>Let’s meet this year’s nominees!</p>
            <h3>The Young and the Spoonless by Cafe Imports</h3>
            <p></p>
            <h3>I Yelp By The Way by Dapper & Wise</h3>
            

            Stay healthy

            1 Reply Last reply Reply Quote 0
            • First post
              Last post
            Copyright © 2014 NodeBB Forums | Contributors