Community
    • Login

    How to copy the Headings, add some text before it and reproduce that just before some unique text

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    11 Posts 4 Posters 992 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • dr ramaanandD
      dr ramaanand
      last edited by dr ramaanand

      <H1........>Heading1</H1>
      <H2........>Some text</H2>
      <H2........>Different text</H2>
      <H2........>Altogether different text</H2>
      Some paragraphs here
      </P> (or </ul>)
      <P..........><span..........><b>Please E-mail us</b></span></P>
      <H2........>Heading that should not be reproduced</H2>
      Some paragraphs here
      </ul> (or </P>)
      <P..........><b><span..........>Please E-mail us</span></b></P>
      

      Should become

      <H1........>Heading1</H1>
      <H2........>Some text</H2>
      <H2........>Different text</H2>
      <H2........>Altogether different text</H2>
      Some paragraphs here
      </P> (or </ul>)
      <P..........><span..........><b>Please E-mail us</b></span></P>
      <H2........>Heading that should not be reproduced</H2>
      Some paragraphs here
      </ul> (or </P>)
      <P..........>We have a Some text, We have a Different text, We have a Altogether different text.
      <P..........><b><span..........>Please E-mail us</span></b></P>
      
      dr ramaanandD 1 Reply Last reply Reply Quote 0
      • dr ramaanandD
        dr ramaanand @dr ramaanand
        last edited by dr ramaanand

        @dr-ramaanand the <b> and <span.............> have exchanged places in some files (out of multiple files of a folder). Heading 2, that is, the text between <H2..........> and </H2> is not limited to 3 lines in some files (out of multiple files of a folder) but I want all the text that is found between <H2..........> and </H2> on alll the lines to be reproduced as explained above.

        Alan KilbornA 1 Reply Last reply Reply Quote 0
        • Alan KilbornA
          Alan Kilborn @dr ramaanand
          last edited by Alan Kilborn

          @dr-ramaanand

          Please note:

          This Community Forum is not a data transformation service; you should not expect to be able to always say “I have data like X and want it to look like Y” and have us do all the work for you. If you are new to the Forum, and new to regular expressions, we will often give help on the first one or two data-transformation questions, especially if they are well-asked and you show a willingness to learn; and we will point you to the documentation where you can learn how to do the data transformations for yourself in the future. But if you repeatedly ask us to do your work for you, you will find that the patience of usually-helpful Community members wears thin. The best way to learn regular expressions is by experimenting with them yourself, and getting a feel for how they work; having us spoon-feed you the answers without you putting in the effort doesn’t help you in the long term and is uninteresting and annoying for us.

          USEFUL REFERENCES

          • Where to find regular expressions (regex) documentation
          dr ramaanandD 1 Reply Last reply Reply Quote 0
          • dr ramaanandD
            dr ramaanand @Alan Kilborn
            last edited by dr ramaanand

            @Alan-Kilborn I tried to do something on my own but gave up because I couldn’t. If someone can help, please do! I have 400 files, so it will be difficult to edit each file separately

            PeterJonesP 1 Reply Last reply Reply Quote 0
            • PeterJonesP
              PeterJones @dr ramaanand
              last edited by PeterJones

              @dr-ramaanand said in How to copy the Headings, add some text before it and reproduce that just before some unique text:

              but gave up

              There is the biggest problem you’ve faced so far. And only you can help you with that problem.

              tried to do something on my own … I have 400 files

              And yet, you didn’t show what you tried, despite the advice given in the search-replace template.

              The correct advice is: if you have more than a handful of text files in the same format, they should have been converted into a database (and 400 pages is way more than a handful). The HTML then becomes a report that you generate from the database, rather than the main storage. It’s much simpler to edit a single template that gets applied to all the data in the database than it is to make the same change to hundreds of pages. Given that you’ve had multiple requests like this already (all apparently on the same set of HTML files), it would be a truly good use of your time to split it out into a database and do that. (In the HTML world, those databases are often called “Content Management Systems” or CMS.)

              That said, I am doubtful you will put in the effort to do it right, and instead expect that we give you a regex that does it.

              Well, I won’t do that – at least not everything – because I doubt you’d actually learn anything if I just handed you the solution, and it would thus be a lot of effort on my part to get things “just right”, only to have you come back in a few days with another similar problem that you expect us to solve for you. But what I will do is say that if it were me, I would break the problem into multiple steps.

              1. Identify the place in all the files where you want to insert the text. Figure out a regex that will place a simple identifiable sequence (like ☺, it just has to be something that will only occur once in each file) where you want to insert the text – probably even with the prefix, like <p...>We have a ☺</p>
              2. Assuming all the <H2> data is on contiguous lines like you’ve shown, then I’ll give you a regex that will take everything from the first <H2...> to the last </H2> near the top, and put it where the ☺ used to be:
                FIND = (?-s)((<H2.*?</H2>\s*)+)(?s).*?\K☺
                REPLACE = ☺$1☹
              3. Now you have the data between a ☺ and ☹, but with the extra <H2...> and </H2> tags in your way. This means you have a “begin” and “end”, where you want to make a change just between those. Our generic regex FAQ lists just such a topic, “Replacing in a specific zone of text” . Follow the formula in that post, where ☺ is your start-of-range marker and ☹ is your end-of-range marker, and you want to search for something like <H2.*?>\s* or </H2>\s* and replace with empty string. (You might have to do it once for each if you cannot figure out the combined regex that deletes both the open and close H2 tags in one, though there are multiple variants that could do it.)
              4. Once that’s done, you can delete the ☺ and ☹ in all the files

              The only way to learn regex is by doing. You have been shown the documentation multiple times. At this point, you need to learn how to do the regex on your own, because you have gone beyond the 1-2 freebies for a newbie, and it’s doubtful that most of the regulars will bother answering you until you start showing more effort in your questions. Good luck.

              ----

              Useful References

              • Please Read Before Posting
              • Template for Search/Replace Questions
              • Formatting Forum Posts
              • FAQ: Where to find regular expressions (regex) documentation
              • Notepad++ Online User Manual: Searching/Regex
              dr ramaanandD 4 Replies Last reply Reply Quote 3
              • dr ramaanandD
                dr ramaanand @PeterJones
                last edited by dr ramaanand

                @PeterJones the RegEx (?s)\A.+?\K((<h2.+?</h2>\R)+).+\K(Please\s*E-mail\s*us)(?<=<p[^]>) will probably find what I want with the <H2.........> and </H2> and then I can probably use We have a $1, $3 in the Replace field. Then I should remove the <H2.........> and </H2> with another RegEx, right? Find (We have a )<H2.*?>(.*?)</H2> and replace with $1 $2 to remove the <H2.........> and </H2> in the final results.

                dr ramaanandD 1 Reply Last reply Reply Quote 0
                • dr ramaanandD
                  dr ramaanand @dr ramaanand
                  last edited by dr ramaanand

                  @PeterJones the RegEx, (?s)\A.+?\K((<h2.+?</h2>\R)+).*\K(Please\s*E-mail\s*us)(?<=<p[^]>) is invalid. I however, want a solution desperately. Please help with the correct RegEx for that! I will manage the removal of the, <H2..............> and </H2> on my own (using find (We have a\x20)<H2.*?>(.*?)</H2> and replace with $1, $2)
                  Anyone can help. Please help!

                  1 Reply Last reply Reply Quote 0
                  • dr ramaanandD
                    dr ramaanand @PeterJones
                    last edited by dr ramaanand

                    @PeterJones (?s)\A.+?\K((<h2.+?</h2>\R)+).*\K(?=Please\s*E-mail\s*us) finds the first <H2......> to </H2> block and puts it just before the last occurrence of, “Please Email us” text but I want it to come even before, that is, before the <p........> string. I tried (?s)\A.+?\K((<h2.+?</h2>\R)+).*\K(?=Please\s*E-mail\s*us)(?<=<p[^]>) without any success. Now I need some help. Please help!

                    1 Reply Last reply Reply Quote 0
                    • dr ramaanandD
                      dr ramaanand @PeterJones
                      last edited by

                      @PeterJones <p...>We have a $1,</p>\x20$2 is what I used in the replace field (for your information)

                      1 Reply Last reply Reply Quote 0
                      • dr ramaanandD
                        dr ramaanand @PeterJones
                        last edited by

                        @PeterJones (?s)\A.+?\K((<h2.+?</h2>\R)+).*\K(?=<p.*?Please\s*E-mail\s*us) helped find the <p...........> string just before the last occurrence of, “Please E-mail us”. Thanks for the help!

                        1 Reply Last reply Reply Quote 0
                        • guy038G
                          guy038
                          last edited by guy038

                          Hello, @dr-ramaanand, @alan-kilborn, @peterjones and All,

                          @dr-ramaanand, here is my only contribution to your problem :

                          If you’ll find this solution insteresting, just be nice and make a donation to @Don-ho. It’s, I think, the least you can do !


                          If I assume that :

                          • There is only 1 concerned zone, of consecutive <H2>.......</H2> blocks, located right after the <H1>.......</H1> block

                          • The text of these <H2>.......</H2> blocks must be copied before the last line of your file which contains the string Please E-Mail us

                          • And that your INPUT text is :

                          <H1........>Heading1</H1>
                          <H2........>Some text</H2>
                          <H2........>Different text</H2>
                          <H2........>Altogether different text</H2>
                          Some paragraphs here
                          </P> (or </ul>)
                          <P..........><span..........><b>Please E-mail us</b></span></P>
                          <H2........>Heading that should not be reproduced</H2>
                          Some paragraphs here
                          </ul> (or </P>)
                          <P..........><b><span..........>Please E-mail us</span></b></P>
                          

                          Note : Each <H2>..........</H2> line may be preceded and/or followed with tabulation and/or space characters

                          Then :

                          • Open a new tab

                          • Paste the above INPUT text in this new tab

                          • Open the Replace dialog ( Ctrl + H )

                          • Select the Regular expression mode and tick the Wrap around option

                          • Follow the road map, below

                          Note : I’ll use the free-spacing mode, (?x), in order to easily identify the main parts of the search regexes


                          With the first regex S/R, below, we’ll place the line @@@ right before the last line of the file containing the string Please E-mail us

                          SEARCH (?xsi) \A .+ \K (?= ^ <P .+ Please \x20 E-mail \x20 us )

                          REPLACE @@@\r\n

                          So, we get this temporary OUTPUT :

                          <H1........>Heading1</H1>
                          <H2........>Some text</H2>
                          <H2........>Different text</H2>
                          <H2........>Altogether different text</H2>
                          Some paragraphs here
                          </P> (or </ul>)
                          <P..........><span..........><b>Please E-mail us</b></span></P>
                          <H2........>Heading that should not be reproduced</H2>
                          Some paragraphs here
                          </ul> (or </P>)
                          @@@
                          <P..........><b><span..........>Please E-mail us</span></b></P>
                          

                          With the second regex S/R, below, we’ll recopy all the <H2>.......</H2> lines, located right after the <H1>.....</H1> block, just before the delimiter line @@@

                          SEARCH (?xsi) (?<= </H1> \r\n ) ( \s* (?: <H2.+?> .+? </H2> \s* )+ ) ^ .+ \K (?= @@@ \R)

                          REPLACE \1

                          And the obtain this temporary OUTPUT :

                          <H1........>Heading1</H1>
                          <H2........>Some text</H2>
                          <H2........>Different text</H2>
                          <H2........>Altogether different text</H2>
                          Some paragraphs here
                          </P> (or </ul>)
                          <P..........><span..........><b>Please E-mail us</b></span></P>
                          <H2........>Heading that should not be reproduced</H2>
                          Some paragraphs here
                          </ul> (or </P>)
                          <H2........>Some text</H2>
                          <H2........>Different text</H2>
                          <H2........>Altogether different text</H2>
                          @@@
                          <P..........><b><span..........>Please E-mail us</span></b></P>
                          

                          Note that the line <H2........>Heading that should not be reproduced</H2>, which is not consecutive to the other H2 lines, is not re-copied, as expected !


                          Now, with the third regex S/R, below, we’ll just rewrite the text of all these <H2>.....</H2> blocks, in a single line :

                          SEARCH (?xi-s) .+ > (.+) </H2> \h* \R (?= ( (?: \h* <H2 .+ \R )+ )? @@@ \R )

                          REPLACE We have a \1?2, :.

                          We get the temporary OUTPUT :

                          <H1........>Heading1</H1>
                          <H2........>Some text</H2>
                          <H2........>Different text</H2>
                          <H2........>Altogether different text</H2>
                          Some paragraphs here
                          </P> (or </ul>)
                          <P..........><span..........><b>Please E-mail us</b></span></P>
                          <H2........>Heading that should not be reproduced</H2>
                          Some paragraphs here
                          </ul> (or </P>)
                          We have a Some text, We have a Different text, We have a Altogether different text.@@@
                          <P..........><b><span..........>Please E-mail us</span></b></P>
                          

                          Finally, with the fourth regex S/R , below, we simply add the leading <P........> part and delete the @@@ string, alltogether

                          SEARCH (?x-s) ^ ( .+) @@@ $

                          REPLACE <P whatever you need >\1

                          And here is your expected OUTPUT text :

                          <H1........>Heading1</H1>
                          <H2........>Some text</H2>
                          <H2........>Different text</H2>
                          <H2........>Altogether different text</H2>
                          Some paragraphs here
                          </P> (or </ul>)
                          <P..........><span..........><b>Please E-mail us</b></span></P>
                          <H2........>Heading that should not be reproduced</H2>
                          Some paragraphs here
                          </ul> (or </P>)
                          <P whatever you need >We have a Some text, We have a Different text, We have a Altogether different text.
                          <P..........><b><span..........>Please E-mail us</span></b></P>
                          

                          Best Regards,

                          guy038

                          1 Reply Last reply Reply Quote 2
                          • First post
                            Last post
                          The Community of users of the Notepad++ text editor.
                          Powered by NodeBB | Contributors