GURU NEEDED - Stripping, reformatting, saving HTML...



  • Looking forward to be in the office tomorrow to try. :)
    I really appreciate the effort man. I Am speechless right now



  • Hi, Gabriele,

    I slightly change my previous post, in order to match the last line of your source text, which doesn’t have any EOL character. So, I added the \z assertion, to get the very end of your source file. Otherwise, the last non wanted line, below :

    AACU1821AAC41821C         SC3            24                                                                             ir/jl

    would have been wrongly rewritten, at the end of your HTML-destination file

    Cheers

    guy038



  • @guy038 said:

    (?6:(?2<p>)\4(?3</p>\r\n\r\n<p style=“text-align:right”> </p>\r\n:(?5</p>\r\n\r\n<\p>:<br />\r\n)))

    guy, you are the man!

    I ran it and it worked out as you expected, but not as I expected, but it’s my fault. I didn’t explain the job accurately enough.

    My destination file has only one product, but the source file contains 3 products. So I need all those :)
    Anyway, the file I need to process is huge with 80k+ products so I need your function to process them all. Maybe Notepad++ won’t be able to run it so I’ll have to split the document in several parts, which won’t be a problem.

    Another mistake I made is not to mention that I need the product code (i.e.: AACR1035) at the beginning of every description block so that I have a reference when I import into DB.

    This said, the very best final result would be having a CSV formatted document like THIS ONE

    Sorry again for not being very clear and, again, my compliments for such result!
    G



  • @guy038 said:

    (?-s)

    NOTE: I don’t need the last 2 lines:

    sdw 6/12/01
    ir/jl



  • Hello Gabriele,

    Ah !, as usual, the problem it’s not the regex, itself, but, rather, the full comprehension of what you exactly need !

    • Firstly, when I opened your file destination-2.csv, in Excel, all your text was curiously split in two cells, only : the column A contains, practically, all your text, except for the final tag </p>, which is located in column B, due to the semi-colon of the form &nbsp;. So, it, normally, acts as a field separator, in a CSV document !

    • Secondly, in column A, after the product code, you begin the text with a double quotes delimiter and end it, in column B, after the </p> tag, with this same delimiter. However, are you aware that you, already, have such delimiters, in your last formatting line ( style="text-align:right" ) ?

    • Thirdly, you said :

    My destination file has only one product, but the source file contains 3 products. So I need all those :)

    I don’t understand very well what you mean ? Of course, I would never change your source files. Just copy your source file as a destination file. Then process the regex(es) on this destination file only !

    • Fourthly, when you spoke about the product code, you mentioned the value AACR1035, but the complete string is AACR1035AAC11035M. Of course, I saw that the number seems repeated. So, which string would you write in your destination file ?

    • Finally, and the most important, to get a right idea of the process to do, you could send me your file of 80k+ products, if you don’t mind and if it is NOT confidential, of course !. My e-mail address is :

    tguy.038@gmail.com

    Surely, this file will suggest me some other questions, but we’ll get near to the solution !

    BTW, don’t worry about the necessity of splitting your file. I don’t think it’ll be necessary. However, time of processing the final regex S/R, may be important. Anyway, if I succeed to run it on my old XP laptop, it should be OK, on a more recent configuration :-))

    Cheers,

    guy038



  • destination-2.csv
    I don’t see where you got this file from. There is only destination.csv on my server. :)
    The description you gave me of that file is not as I see it in Notepad++
    I’ll send you the file RARed via email. It contains only one line of 2 cells separated by the comma with both values quoted (important) as the descriptions contain commas.

    “are you aware that you, already, have such delimiters, in your last formatting line?”
    No I was not aware. I didn’t pay attention to it. Thanks for noticing it.
    So we need to have a different separator, I guess. “|” would be fine. I can use any separator I want so… let’s go for the |

    The column A contains the product code “AACR1035” (the first part of the code so that AACR1035AAC11035M become “AACR1035” - It’s the important part, the SKU.

    “My destination file has only one product, but the source file contains 3 products. So I need all those :)”
    Sorry I’ll try again… :) (even thought at this point we don’t need that since we can go for the CSV)
    The destination file I provided was just to show you the result of ONE product, but the finale destination file (CSV) need to contain all the products.

    Here is the link to a part of the complete file. It’s 1/5 of the original.
    Here is the complete one.

    Thanks a lot again!
    G



  • Gabriele,

    Really sorry, but your two links don’t seem to work.

    • The first one, relative to the 1/5 of the original file, doesn’t work at all :-((

    • The second one, relative to the complete file, opens the main page of the R.C… Santa site, but, even after clicking on some links, of this main page, I could not get your RAR archive ?!

    BTW, what is the size of your complete file ? May be, you could send me, by e-mail, part of it ( 1/5 or even less )

    I did receive your destination.rar file, attached to your e-mail. Thanks. I will probably have to ask you some other questions about it, but first, I would prefer to get your file ( or a subset of it ) in order to have a general idea, about the tasks to do !

    Cheers,

    guy038



  • Here is the link to a part of the complete file. It’s 1/5 of the original.
    Here is the complete one.

    the first link doens’t seem to be saved correctly by the script here on this website.
    links sent also via email



  • Hi Gabriele,

    Yeah ! This time, your links, sent by e-mail, are OK. So, I now get :

    • The technoteCOMPLETE.rar archive, whose I extracted the huge technote.txt file , of 183 Mo !

    • The Technote1.rar archive, whose I extracted the technote1.csv file, of 21,8 Mo

    Just note that, in your last post, the link of the partial file is still wrong !

    Finally, I think that the right syntax of these two links are, simply :

    http://www.rc-santa.com/temp/technote1.rar

    http://www.rc-santa.com/temp/technoteCOMPLETE.rar

    Well. I’m going to glance to your two huge files !

    See you soon,

    Cheers

    guy038



  • “Just note that, in your last post, the link of the partial file is still wrong !”

    Yeah… there must be something wrong whit this forum script when parsing URLs. I tried to work on it but after 180 secs you can’t edit anymore so I couldn’t delete the links.


Log in to reply