Please help me a with a regex replacement formula



  • Hi, I’m trying to fix up my forums’ database. Namely the table which holds all the posts and its html tags to be specific. After moving to a new software these html tags need adjusting.

    My DB has a lot of html entries now which look like this:

    <div class=\"bbc_spoiler\">\n	<span class=\"spoiler_title\">Next move</span> <input type=\"button\" class=\"bbc_spoiler_show\" value=\"Rodyti\"><div class=\"bbc_spoiler_wrapper\"><div class=\"bbc_spoiler_content\" style=\"display:none;\">THE HIDDEN TEXT</div></div>\n</div>
    

    and I must replace it to this for it to work normally to this

    <div class=\"ipsSpoiler\" data-ipsSpoiler=\"\"><div class=\"ipsSpoiler_header\"><span></span></div><div class=\"ipsSpoiler_contents\"><p>THE HIDDEN TEXT</p></div></div>
    

    notice that the code after THE HIDDEN TEXT is shortened from </div></div>\n</div> to </p></div></div>

    but for the love of God I can’t figure out how to make a replacement query that would remove one extra </div> at the end of every query which has a variable text before it and

    <div class=\"bbc_spoiler\">\n	<span class=\"spoiler_title\">Next move</span> <input type=\"button\" class=\"bbc_spoiler_show\" value=\"Rodyti\"><div class=\"bbc_spoiler_wrapper\"><div class=\"bbc_spoiler_content\" style=\"display:none;\">
    

    in the beginning. any ideas?



  • Hello, @mantelis-telis,

    At first glance, I don’t really see an obvious relationship between your present HTML syntax :

    <div class=\"bbc_spoiler\">\n
    	<span class=\"spoiler_title\">Next move</span>
    	<input type=\"button\" class=\"bbc_spoiler_show\" value=\"Rodyti\">
    	<div class=\"bbc_spoiler_wrapper\">
    		<div class=\"bbc_spoiler_content\" style=\"display:none;\">
    			THE HIDDEN TEXT
    		</div>
    	</div>\n
    </div>
    

    and your expected syntax, below :

    <div class=\"ipsSpoiler\" data-ipsSpoiler=\"\">
    	<div class=\"ipsSpoiler_header\">
    		<span></span>
    	</div>
    	<div class=\"ipsSpoiler_contents\">
    		<p>THE HIDDEN TEXT</p>
    	</div>
    </div>
    

    Could you provide additional information ? Thanks

    Best Regards,

    guy038



  • Thanks for trying to help me. :)

    I will do my best to describe what I’m trying to achieve here.

    I’m converting upgrading my forum posts database table and in it there are thousands of instances of some sort of legacy bbcode used in previous version of the forum and if I upgrade the forum version then in the new version the content becomes sort of locked and thus invisible.

    In order to avoid that I must run a mass replace query on that table. So essentially it breaks down to this:

    1. Easy part. Mass replace first part of the code before the actual text this
    <div class=\"bbc_spoiler\">\n	<span class=\"spoiler_title\">Next move</span> <input type=\"button\" class=\"bbc_spoiler_show\" value=\"Rodyti\"><div class=\"bbc_spoiler_wrapper\"><div class=\"bbc_spoiler_content\" style=\"display:none;\">
    

    into

    <div class=\"ipsSpoiler\" data-ipsSpoiler=\"\"><div class=\"ipsSpoiler_header\"><span></span></div><div class=\"ipsSpoiler_contents\"><p>
    
    1. In the middle the hidden text remains as is.

    2. The end of the bbcode needs to be turned from

    </div></div>\n</div>
    

    into

    </p></div></div>
    

    But I alone can’t run such a simple mass replace query because in many other posts’ bbcodes this fragment of code appears too and should stay that way. In other words I might end up changing it in too many instances.

    So I’m thinking is it possible to formulate such a regex thing where all this replacing thing can be done via one query but I can’t understand how can I make notepad++ to take whatever is written between the tags as a variable and make notepad++ change start and the end of the code.

    Can you understand now?



  • @Mantelis-Telis

    If I’m understanding correctly…

    So really it’s the classic problem:

    Change ABCdontchangeDEF into GHIdontchangeJKL, where the dontchange part is THE HIDDEN TEXT in your specific case, but it is the variable part in what you’re needing to do.

    One way this could be done to search for ABC(.+?)DEF and to replace it with GHI${1}JKL

    You just have to get your ABC, DEF, GHI, and JKL right, which from the type of data you have, I would have zero confidence that I could get it right for you from this side of the “forum wall”.

    But, if you can get Notepad++ to find-match those 4 things in your data (in Regular expression mode even though those 4 pieces are actually constant data), then it’s doable, and in reality not “too hard”.



  • @Alan-Kilborn said in Please help me a with a regex replacement formula:

    One way this could be done to search for ABC(.+?)DEF and to replace it with GHI${1}JKL

    I tried using that with search mode: regex but it does not work :(



  • @Mantelis-Telis

    I contend that it DOES work.

    First step: Can you match your “ABC” data, alone?



  • No, if I use normal search mode then notepad++ finds ABC
    but when in regex - not

    <div class=\"bbc_spoiler\">\n	<span class=\"spoiler_title\">Sekantis veiksmas</span> <input type=\"button\" class=\"bbc_spoiler_show\" value=\"Rodyti\"><div class=\"bbc_spoiler_wrapper\"><div class=\"bbc_spoiler_content\" style=\"display:none;\">
    


  • @Mantelis-Telis said in Please help me a with a regex replacement formula:

    <div class=“bbc_spoiler”>\n <span

    In regex the sequence \n has special meaning.
    From this end we can’t tell (with certaintly) what you data really is.
    See if you can get the short part just above matching in regex mode.
    That’s the only special sequence I see in your data.



  • I read on stackoverflow I found this post telling to use \ in order to make regex work with notepad++. called escaping - https://stackoverflow.com/questions/51058902/save-n-symbol-in-the-string-js

    then I made such replacement query:
    find

    <div class=\\"bbc_spoiler\\">\\n	<span class=\\"spoiler_title\\">Sekantis veiksmas</span> <input type=\\"button\\" class=\\"bbc_spoiler_show\\" value=\\"Rodyti\\"><div class=\\"bbc_spoiler_wrapper\\"><div class=\\"bbc_spoiler_content\\" style=\\"display:none;\\">(.+?)</div></div>\\n</div>
    

    replace with

    <div class=\\"ipsSpoiler\\" data-ipsSpoiler=\\"\\"><div class=\\"ipsSpoiler_header\\"><span></span></div><div class=\\"ipsSpoiler_contents\\"><p>${1}</p></div></div>
    

    I guesst it worked. Do any of you agree this should be the right way to do it?



  • @Mantelis-Telis

    Do any of you agree this should be the right way to do it?

    It seems reasonable.
    The key part for you is when you said “I guesst it worked”



  • Hi, @mantelis-telis, @alan-kilborn, and All,

    If we assume the initial and final texts, given in your first post, the following regex S/R does work ;-)) :

    SEARCH (?-is)<div class=\\"bbc_spoiler\\">\\n\t<span class=\\"spoiler_title\\">Next move</span> <input type=\\"button\\" class=\\"bbc_spoiler_show\\" value=\\"Rodyti\\"><div class=\\"bbc_spoiler_wrapper\\"><div class=\\"bbc_spoiler_content\\" style=\\"display:none;\\">(.+?)</div></div>\\n</div>

    REPLACE <div class=\\"ipsSpoiler\\" data-ipsSpoiler=\\"\\"><div class=\\"ipsSpoiler_header\\"><span></span></div><div class=\\"ipsSpoiler_contents\\"><p>${1}</p></div></div


    Notes :

    • At beginning of the search regex, I included the (?-is) syntax. These are in-line modifiers which :

      • Carry out the search in a sensitive way ( so in a non-insensitive way : -i )

      • Force the regex dot symbol ( . ) to be seen as matching a single standard character, but not the EOL chars ( so no single line search : -s )

    • The tabulation character, right after \n has been changed into its regex representation \t


    Remark :

    In your last post, the two words Next Move, in your search regex, were replaced with the words Sekantis veiksmas !

    Cheers,

    guy038



  • @guy038 hello :)

    okay first since I tried to replace tabulation character - \t - but the notepad++ does not register the find so I had to leave it as it was.

    Regarding the remark - its okay the stuff I’m fixing is not in English language so I simply translated this small part to english in case it would be easier to understand for everyone though it bears no meaning withtout more context. :)

    I think this thing worked for me in the end. I see no errors on the database so far.

    Will get into youtube video code embedding replacement next which is a bit different than this query. Will ask for help later. :) Thanks for everybody involved.


Log in to reply