Community
    • Login

    GURU NEEDED - Stripping, reformatting, saving HTML...

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    12 Posts 2 Posters 7.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Gabriele CripezziG
      Gabriele Cripezzi
      last edited by

      Looking forward to be in the office tomorrow to try. :)
      I really appreciate the effort man. I Am speechless right now

      1 Reply Last reply Reply Quote 0
      • guy038G
        guy038
        last edited by guy038

        Hi, Gabriele,

        I slightly change my previous post, in order to match the last line of your source text, which doesn’t have any EOL character. So, I added the \z assertion, to get the very end of your source file. Otherwise, the last non wanted line, below :

        AACU1821AAC41821C         SC3            24                                                                             ir/jl

        would have been wrongly rewritten, at the end of your HTML-destination file

        Cheers

        guy038

        1 Reply Last reply Reply Quote 0
        • Gabriele CripezziG
          Gabriele Cripezzi
          last edited by

          @guy038 said:

          (?6:(?2<p>)\4(?3</p>\r\n\r\n<p style=“text-align:right”> </p>\r\n:(?5</p>\r\n\r\n<\p>:<br />\r\n)))

          guy, you are the man!

          I ran it and it worked out as you expected, but not as I expected, but it’s my fault. I didn’t explain the job accurately enough.

          My destination file has only one product, but the source file contains 3 products. So I need all those :)
          Anyway, the file I need to process is huge with 80k+ products so I need your function to process them all. Maybe Notepad++ won’t be able to run it so I’ll have to split the document in several parts, which won’t be a problem.

          Another mistake I made is not to mention that I need the product code (i.e.: AACR1035) at the beginning of every description block so that I have a reference when I import into DB.

          This said, the very best final result would be having a CSV formatted document like THIS ONE

          Sorry again for not being very clear and, again, my compliments for such result!
          G

          1 Reply Last reply Reply Quote 0
          • Gabriele CripezziG
            Gabriele Cripezzi
            last edited by

            @guy038 said:

            (?-s)

            NOTE: I don’t need the last 2 lines:

            sdw 6/12/01
            ir/jl

            1 Reply Last reply Reply Quote 0
            • guy038G
              guy038
              last edited by guy038

              Hello Gabriele,

              Ah !, as usual, the problem it’s not the regex, itself, but, rather, the full comprehension of what you exactly need !

              • Firstly, when I opened your file destination-2.csv, in Excel, all your text was curiously split in two cells, only : the column A contains, practically, all your text, except for the final tag </p>, which is located in column B, due to the semi-colon of the form &nbsp;. So, it, normally, acts as a field separator, in a CSV document !

              • Secondly, in column A, after the product code, you begin the text with a double quotes delimiter and end it, in column B, after the </p> tag, with this same delimiter. However, are you aware that you, already, have such delimiters, in your last formatting line ( style="text-align:right" ) ?

              • Thirdly, you said :

              My destination file has only one product, but the source file contains 3 products. So I need all those :)

              I don’t understand very well what you mean ? Of course, I would never change your source files. Just copy your source file as a destination file. Then process the regex(es) on this destination file only !

              • Fourthly, when you spoke about the product code, you mentioned the value AACR1035, but the complete string is AACR1035AAC11035M. Of course, I saw that the number seems repeated. So, which string would you write in your destination file ?

              • Finally, and the most important, to get a right idea of the process to do, you could send me your file of 80k+ products, if you don’t mind and if it is NOT confidential, of course !. My e-mail address is :

              Surely, this file will suggest me some other questions, but we’ll get near to the solution !

              BTW, don’t worry about the necessity of splitting your file. I don’t think it’ll be necessary. However, time of processing the final regex S/R, may be important. Anyway, if I succeed to run it on my old XP laptop, it should be OK, on a more recent configuration :-))

              Cheers,

              guy038

              1 Reply Last reply Reply Quote 0
              • Gabriele CripezziG
                Gabriele Cripezzi
                last edited by

                destination-2.csv
                I don’t see where you got this file from. There is only destination.csv on my server. :)
                The description you gave me of that file is not as I see it in Notepad++
                I’ll send you the file RARed via email. It contains only one line of 2 cells separated by the comma with both values quoted (important) as the descriptions contain commas.

                “are you aware that you, already, have such delimiters, in your last formatting line?”
                No I was not aware. I didn’t pay attention to it. Thanks for noticing it.
                So we need to have a different separator, I guess. “|” would be fine. I can use any separator I want so… let’s go for the |

                The column A contains the product code “AACR1035” (the first part of the code so that AACR1035AAC11035M become “AACR1035” - It’s the important part, the SKU.

                “My destination file has only one product, but the source file contains 3 products. So I need all those :)”
                Sorry I’ll try again… :) (even thought at this point we don’t need that since we can go for the CSV)
                The destination file I provided was just to show you the result of ONE product, but the finale destination file (CSV) need to contain all the products.

                Here is the link to a part of the complete file. It’s 1/5 of the original.
                Here is the complete one.

                Thanks a lot again!
                G

                1 Reply Last reply Reply Quote 0
                • guy038G
                  guy038
                  last edited by guy038

                  Gabriele,

                  Really sorry, but your two links don’t seem to work.

                  • The first one, relative to the 1/5 of the original file, doesn’t work at all :-((

                  • The second one, relative to the complete file, opens the main page of the R.C… Santa site, but, even after clicking on some links, of this main page, I could not get your RAR archive ?!

                  BTW, what is the size of your complete file ? May be, you could send me, by e-mail, part of it ( 1/5 or even less )

                  I did receive your destination.rar file, attached to your e-mail. Thanks. I will probably have to ask you some other questions about it, but first, I would prefer to get your file ( or a subset of it ) in order to have a general idea, about the tasks to do !

                  Cheers,

                  guy038

                  1 Reply Last reply Reply Quote 0
                  • Gabriele CripezziG
                    Gabriele Cripezzi
                    last edited by Gabriele Cripezzi

                    Here is the link to a part of the complete file. It’s 1/5 of the original.
                    Here is the complete one.

                    the first link doens’t seem to be saved correctly by the script here on this website.
                    links sent also via email

                    1 Reply Last reply Reply Quote 0
                    • guy038G
                      guy038
                      last edited by guy038

                      Hi Gabriele,

                      Yeah ! This time, your links, sent by e-mail, are OK. So, I now get :

                      • The technoteCOMPLETE.rar archive, whose I extracted the huge technote.txt file , of 183 Mo !

                      • The Technote1.rar archive, whose I extracted the technote1.csv file, of 21,8 Mo

                      Just note that, in your last post, the link of the partial file is still wrong !

                      Finally, I think that the right syntax of these two links are, simply :

                      http://www.rc-santa.com/temp/technote1.rar

                      http://www.rc-santa.com/temp/technoteCOMPLETE.rar

                      Well. I’m going to glance to your two huge files !

                      See you soon,

                      Cheers

                      guy038

                      1 Reply Last reply Reply Quote 0
                      • Gabriele CripezziG
                        Gabriele Cripezzi
                        last edited by

                        “Just note that, in your last post, the link of the partial file is still wrong !”

                        Yeah… there must be something wrong whit this forum script when parsing URLs. I tried to work on it but after 180 secs you can’t edit anymore so I couldn’t delete the links.

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors