deplace a block to an other place
-
What you said you tried for the search/replace didn’t quite match my examples, but that’s okay. I will customize it based on what you’ve given me:
- Find =
(<noscript[^>]*?class=*["“][^"“”]*?["”][^>]*?lang=["“][^"“”]*?["”]>.*?</noscript>\R*)(.*?)(<img[^>]*?style=["“]display:none;["”]></div>)
- Replace =
$2$3\r\n$1
- Assumptions/Changes
- I used
[>]*?
to include various unknown attributes, assuming they don’t contain a>
: if they do, this won’t work - I used expressions that were ambivalent about smart quotes
“”
and normal quotes""
, because you switched back and forth. Personally, I doubt there are actual smart quotes for the attributes in your HTML, but this allows my expression to work on exactly the text that you put in your examples, or on text that I think is well-formed HTML - Similary, I used
class=*
to allowclass="text"
orclass"text"
, because I believeclass"text"
is invalid HTML, but I wanted the expression to work on your example text, just in case you are processing invalid HTML. - I allowed anything (or, rather, any non-quote character) inside the
class="text"
andlang="example"
, so it’s not restricted to exactlytext
orexample
: I saw you hadexample
andotherexample
, so I wanted to make sure thattext
andothertext
would both work, too. - Since there is a language barrier, I am assuming by “to move = déplacer” you mean “move it, completely deleting the block from the beginning, and instead putting it after the
<img display:none;">...</div>
”
- I used
So that should work manually on the two example files you gave me: when I copy/pasted from your post into two files, then ran the exact FIND and REPLACE listed above once per file, it successfully found the
noscript
and moved it after theimg
.HOWEVER (and this is a big “however”), that won’t be fun for hundreds of files. I tried having NPP open with just the two example files, then I used the
Replace All in All Opened Documents
option, and it replaced it in both files. Then I copied the originals so I had 100 files, and I opened them ALL in Notepad++, and the “Replace All in ALl Opened Documents” still did what I wanted. So it might work if you just open all 400 hundred files, though there’s no guarantee: I might do it in smaller groups of 50-100 files, depending on how large your files are.If you are going to be frequently making changes like this, but the rules are changing, I highly recommend learning a scripting language: I would have done this from the command line with Perl; many others here would have used Python or Lua. The benefit of the latter two is that there are plugins for automating Notepad++ using the Python Script Plugin or the LuaScript Plugin. @Claudia-Frank could proably write the Python Script to open each of the 400 files, apply my regex, and save and close them without much difficulty. Unfortunately, I am not a Python expert, so it would take me hours more than I have available to muddle through it. (It may even be that @go2to’s ecobyte-link provides such a scripting language, or a simple interface for applying a particular regex to a multitude of files.)
- Find =
-
a great thanks peterjones
perhaps the solution for me will be to joint each file x2 then remove the lines i dont wantin dos do you know how to join in one command each file of a directoty with itself
for one file its easy
copy name.htm +name.htm d:\temp\name.htmtime to sleep for me i shall search for this tomorrow
*do you know this french free texteditor http://www.gbesoft.fr/gbepad.php
its not bad, i use it in association wth notepad++ -
A script could look like this, so if you are interested in
a python script solution let me know and we can work it out.# -*- coding: utf-8 -*- import os FIND_WHAT = u'FIND_WHAT_REGULAR_EXPRESSION' REPLACE_WITH = u'REPLACE_WITH_EXPRESSION' DIRECTORY_OF_FILES = u'THE_DIRECTORY' # like c:\\temp\\directory - doublebackslash needed os.chdir(DIRECTORY_OF_FILES) list_of_files = [x for x in os.listdir('.') if x.endswith('.html')] for file in list_of_files: notepad.open(file) editor.rereplace(FIND_WHAT, REPLACE_WITH) notepad.save() notepad.close()
This script looks for html files in a defined directory.
If your files are under certain different subdirectories we need to modify the code.Cheers
Claudia -
Hi, @kat75,
Here is my contribution, to your problem ;-)
Let’s suppose you have two files 1.html and 2.html, as below :
<!-- 1.html --> Some text before the "noscript" tag <noscript class"text" lang="example"> line1 line2 line3 line4 </noscript> bla bla bla blah <img… and …style="display:none;"></div> And some text, located after ............ <noscript> ............
and
<!-- 2.html --> bla bla <noscript> bla <noscript class"text" lang="otherexample"> Other line1 Other line2 Other line3 Other line4 Other line5 </noscript> Other text… etc… etc… etc… <img… other and …style="display:none;"></div> Other Text .......... till the END of the file
-
Move back, at the very beginning of current file (
Ctrl + Origin
) -
Open the Replace dialog (
Ctrl + H
) -
Select the Regular expression search mode
-
In the Search what: box, type in the regex
(?s)(<noscript class.+?</noscript>\R)(.+?style="display:none;"></div>\R)
-
In the Replace with: box, type in the regex
\2\1
-
Click on the Replace All button
=> Your two html files should be modified, as below :
<!-- 1.html --> Some text before the "noscript" tag bla bla bla blah <img… and …style="display:none;"></div> <noscript class"text" lang="example"> line1 line2 line3 line4 </noscript> And some text, located after ............ <noscript> ............
and
<!-- 2.html --> bla bla <noscript> bla Other text… etc… etc… etc… <img… other and …style="display:none;"></div> <noscript class"text" lang="otherexample"> Other line1 Other line2 Other line3 Other line4 Other line5 </noscript> Other Text .......... till the END of the file
I do hope that it’s the kind of Search/Replacement what you expect to !
Notes :
-
I suppose that the ending tag
</noscript>
ends the current line, in your html files -
I suppose the the text
<img………style="display:none;"></div>
ends the current line, too -
As usual, the
(?s)
modifier means that a dot regex character represents any single character ( Standard or EOL chars ) -
Then, the part
<noscript class.+?</noscript>\R
looks for any complete individual noscript block ( Note that the form.+?
stands for the smallest non-null range of characters, between the two strings <noscript class and </noscript> ) -
This
noscript
block, with the final EOL characters (\R
), is stored as group 1, due to the enclosed parentheses -
Finally the part
.+?style="display:none;"></div>\R
searches for the smallest range of characters till the string style=“display:none;”></div> -
Again, this block of lines, with the final EOL characters (
\R
), is stored as group 2, due to the enclosed parentheses -
In replacement, we just rewrite these two groups, in a reverse order ! (
\2\1
)
Remarks :
-
As previously said, for 400 files about, you should use the Replace in Files feature (
Ctrl + Shift + F
) -
Group all your html files, which have to be modified, in a particular folder
-
Do a
BACKUP
of this folder ( One never knows ! ) -
Fill up the Filters and Directory boxes, too
-
Click on the Replace in Files button, to perform a global replacement, on all your files
-
IMPORTANT
: Do NOT use the v7.4 or v7.4.1 version of N++. Indeed, with these last versions, there a bug, when multiple S/R are performed : a part of the files scanned are not changed -:((( Refer to :
Cheers,
guy038
-
-
hello
just awake
thanks peter, claudia and guy
guy its magic, seems to work fine, i didnt thank that it was so easily possible
i must know how it works and donwload a previous version of notepad++ i have v7.41
really fantastic
a great thanks -
@kat75 Guy often performs magic with Regular Expressions. :) And I believe he speaks French also.
A good starting point for Regular Expressions is http://www.regular-expressions.info/
-
thanks glen for the link
i shall try to learn regular expressions, there are really possibilities -
hello
i have a question
i am trying to understand how what has given to me guy works
(?s)(<noscript class.+?</noscript>\R)(.+?style=“display:none;”></div>\R)if i try to get an other sequence between a a tag by example… i got nothing! what is wrong?
(?s)(<a href.+?</a>\R)
with an img tag it doesnt take the first “> why? :-) thanks
(?s)(<img class.+?”>\R) -
@kat75
without having the data we cannot say for sure what went wrong.
So, in theory, the hyperlink regex would match everyhting within the tag if
closing tags is directly followed by an eol char (no space)
and the image regex would match starting with <img class followed by
anything until ”> appears, followed directly by an eol char.Cheers
Claudia -
at the end it must be a eol character, ok i understand now
-
i would to say thanks to all, i have well succeed to move my block and begin to understand a bit notepad++
is there a way to send private messages on notepad community, i dont find how -
Hello
do you think that its possible to move a block not to the bottom but to the top?
i dont succeed
i mean that in this example, how to move <img etc… and …style=“display:none;”></div> just before <noscript class"text" lang=“example”>1.html
<noscript class"text" lang=“example”>
line1
line2
line3
line4
</noscript>
some text…
etc…
etc
<img… and …style=“display:none;”></div>2.html
<noscript class"text" lang=“otherexample”>
otherline1
otherline2
otherline3
otherline4
</noscript>
othersome text…
etc…
etc etc
<img… other and …style=“display:none;”></div> -
Hello, @pouemes44, and All,
Here is, below, the general method in order to switch two consecutive blocks of text, separated, or not, by some stuff text
So let’s start with the example text, below :
bla bla bla Block 1 of some text End block bla bla bla bla bla bla bla bla Block 2 with some other text End block bla bla bla bla bla bla bla bla bla
To correctly determine the limits of your block of text, whatever it is, you, necessarily, need to know :
-
The location of the beginning of your block. In our example, I suppose that it’s the regex
^Block \d+
-
The location of the end of your block. In our case, I suppose it’s the string
End block
IMPORTANT
: In our example, the second block has the same limits as Block 1, but it could, perfectly, have some other limit definitions !So, from the above definitions, we can build the regex S/R, below :
SEARCH
(?s)(^Block \d+.+?End block)(.*?)(^Block \d+.+?End block)
REPLACE
\3\2\1
Notes :
-
The leading
(?s)
modifier means that the special dot character represents, absolutely, any single character. Now : -
The first part,
(^Block \d+.+?End block
, represents a complete individual block of text, stored as group 1 -
The third part,
(^Block \d+.+?End block)
, stands, again, for an other complete block of text, stored as group 3 -
The middle part,
(.*?)
, is the shortest range, even empty, of any character, between these two consecutive blocks of text -
In replacement, we just switch the group 1 and 3, with group 2 standing as a pivot
And, after a click on the Replace All button, you should get the expected text , below :
bla bla bla Block 2 with some other text End block bla bla bla bla bla bla bla bla Block 1 of some text End block bla bla bla bla bla bla bla bla bla
Just note that, if the two blocks are strictly consecutive, without the bla bla lines, as below :
bla bla bla Block 1 of some text End block Block 2 with some other text End block bla bla bla bla bla bla bla bla bla
We still get the correct modified text :
bla bla bla Block 2 with some other text End block Block 1 of some text End block bla bla bla bla bla bla bla bla bla
Now, Pouemes44, let’s apply this general method to your particular problem :
The general template, of your first block, is :
<noscript...... ...... ...... ...... ...... </noscript>
The general template, of your second block, is :
<img......... ....... ....... ....... style="display:none;"></div>
Therefore, the corresponding regex S/R, in your case, is, obviously :
SEARCH
(?s)(^<noscript.+?</noscript>)(.*?)(^<img.+?style="display:none;"></div>)
REPLACE
\3\2\1
So, considering your example, below :
1.html <noscript class"text" lang="example"> line1 line2 line3 line4 </noscript> some text… etc… etc <img… and …style="display:none;"></div> 2.html <noscript class"text" lang="otherexample"> otherline1 otherline2 otherline3 otherline4 </noscript> othersome text… etc… etc etc <img… other and …style="display:none;"></div>
After performing the above S/R, you’ll get the modified text, as expected to :
1.html <img… and …style="display:none;"></div> some text… etc… etc <noscript class"text" lang="example"> line1 line2 line3 line4 </noscript> 2.html <img… other and …style="display:none;"></div> othersome text… etc… etc etc <noscript class"text" lang="otherexample"> otherline1 otherline2 otherline3 otherline4 </noscript>
Et voilà !!
Best Regards,
guy038
P.S. : Beware that, on our NodeBB site, text, containing starting and ending usual simple and double quotes, are changed into their Unicode equivalents, below :
-
Starting simple quote ‘, of Unicode code-point
\x{2018}
, instead of the single quote sign'
(x{0027
) -
Ending simple quote ’, of Unicode code-point
\x{2019}
, instead of the single quote sign'
(x{0027
) -
Starting double quote “, of Unicode code-point
\x{201C}
, instead of the double quote sign"
(x{0022
) -
Ending double quote ”, of Unicode code-point
\x{201D}
, instead of the double quote sign"
(x{0022
)
Notes :
- For ANSI encoded texts, to avoid the nasty message Find: Invalid regular expression, the **correct regex syntaxes are, respectively,
\x91
,\x92
,\x93
and\x94
BTW :
-
Remember that the
\x{####}
regex syntax can be used :-
For search of a true ASCII character, between
\x{0000}
and\x{007F}
, in ANSI encoded files -
For search of any Unicode character, between
\x{0000}
and\x{FFFF}
, in UNICODE encoded files
-
-
Remember that the
\x{##}
regex syntax can be used :-
For search of a true ASCII character, between
\x{00}
and\x{7F}
, in ANSI encoded files -
For search of any Unicode character, between
\x{00}
and\x{FF}
, in UNICODE encoded files
-
-
To end with, remember that the
\x##
regex syntax can be used :- For search of any Unicode character, between
\x00
and\xFF
, either, in UNICODE or ANSI encoded files
- For search of any Unicode character, between
-
-
a great thanks guy
that was this part (.*?) i missed -
i have an other question
if i search (?s)(<h1.+?</h1>\R) i got a result in my filebut if i search (?s)(<a.+?</a>\R) to get <a href=“mypage.htm” title=page">page</a> i got nothing why? what is wrong?
-
Hello, @pouemes44,
- In the regex
(?s)(^Block \d+.+?End block)(.*?)(^Block \d+.+?End block)
, of my previous post, the part(.*?)
represents the shortest range of characters, even empty, stored as group 2, between the two consecutive blocks of text, that are to be swapped !
Now, if we consider the HTML example text, below :
<td height="15"> <font size="2" color="black" face="arial, verdana"><b>Lire un message / dossier : Reçus</b></font> </td> <td height="15"> <font size="2" color="black" face="arial, verdana"><b>Lire un message / dossier : Reçus</b></font> </td>
Beware about the two different behaviours :
-
The regex
(?s)<td.+</td>
looks for the largest range of characters.+
, between the strings <td and </td> => It matches all the text, at once -
The regex
(?s)<td.+?</td>
looks for the shortest range of characters.+?
, between the strings <td and </td> => it matches, successively, each block <td…</td>
Now, regarding your HTML text :
<a href=“mypage.htm” title=page">page</a>
I suppose that you do NOT get a match, using the regex
(?s)(<a.+?</a>\R)
, because, probably, it’s the last line of your current file, which is NOT followed by any End of line character !Indeed, in that case, the ending part
\R
cannot match anything. So the overall match fails :-((Two solutions :
-
Use the regex
(?s)(<a.+?</a>\R?)
. With that syntax, the\R
part is optional -
Add a line break to this last line !
For a better visualization of the End of line characters, just click on the
¶
iconeCheers,
guy038
- In the regex
-
Hello Guy
i just see your answer
a great thanks, yes work with (?s)(<a.+?</a>\R?)
a real thanks for all your explanations, i already succeed to do wonderful things with them. -
Hi, All,
I’m back for additional information, about
lazy
,greedy
andpossessive
quantifiers. it’s fundamental to, correctly, understand the differences, between these 3 types of quantifiers !So, let’s consider the simple text 12345ABCDE, in a new tab
How the regex engine interprets, for instance, the regex
\w{1,10}[A-Z]{5}
, with thegreedy
quantifier{1,10}
?. Well :-
It, first, tries to match the LONGEST range of
\w
=> 10 Word characters. But, the part[A-Z]{5}
CANNOT match anything -
Then, it backtracks and tries the first 9 Words characters. Again, the part
[A-Z]{5}
does NOT match the E letter -
Then, it backtracks and tries the first 8 Words characters. Again, the part
[A-Z]{5}
does NOT match the DE letters -
Then, it backtracks and tries the first 7 Words characters. Again, the part
[A-Z]{5}
does NOT match the CDE letters -
Then, it backtracks and tries the first 6 Words characters. Again, the part
[A-Z]{5}
does NOT match the BCDE letters -
Then, it backtracks and tries the first 5 Words characters. This time, the part
[A-Z]{5}
DOES match the ABCDE letters
=> After the backtracking phase, all the text is matched and selected !
Now, how the regex engine interprets the regex
\w{1,10}?[A-Z]{5}
, with thelazy
quantifier{1,10}?
?-
It, first, tries to match the SHORTEST range of
\w
=> 1 Word character. But, the part[A-Z]{5}
CANNOT match the 2345ABCDE string -
Then, it backtracks and tries the first 2 Words characters. Again, the part
[A-Z]{5}
does NOT match the 345ABCDE string -
Then, it backtracks and tries the first 3 Words characters. Again, the part
[A-Z]{5}
does NOT match the 45ABCDE string -
Then, it backtracks and tries the first 4 Words characters. Again, the part
[A-Z]{5}
does NOT match the 5ABCDE string -
Then, it backtracks and tries the first 5 Words characters. This time, the part
[A-Z]{5}
DOES match the ABCDE letters
=> After the backtracking phase, all the text is matched and selected !
Note : Instead of the English werb backtrack, the verb fortrack would be more adapted ! Sorry, English isn’t my mother tongue !
Finally, how the regex engine interprets the regex
\w{1,10}+[A-Z]{5}
, with thepossessive
quantifier{1,10}+
?-
It, first, tries to match the LONGEST range of
\w
=> 10 Word characters. But, the part[A-Z]{5}
CANNOT match anything -
Now, the normal process would be to backtrack. But this action is forbidden, due to the
possessive
quantifier ! In other words, once a match has been found, for the first part\w{1,10}+
, the following parts of the regex must match the remaining of the text. But, as the first regex part have consumed all the text, the part[A-Z]{5}
will NEVER match anything !
So, the overall match fails and you get the normal message Find: Can’t find the text “\w{1,10}+[A-Z]{5}
”
Using, again, the same example 12345ABCDE, in a new tab, it’s easy to verify that :
-
The regex
\w{1,10}
matches the longest Word characters range => The whole string 12345ABCDE is matched -
The regex
\w{1,10}?
matches the shortest Word characters range => The 1 Word character is matched, then the 2 digit and so on… -
The regex
\w{1,10}+
matches the longest Word characters range => The whole string 12345ABCDE is matched, too !
So, to sum up, here is, below, a list of all the quantifiers :
GREEDY quantifiers : * ( = {0,} ) + ( = {1,} ) ? ( = {0,1} ) {n} {n,} {m,n} LAZY quantifiers : *? ( = {0,}? ) +? ( = {1,}? ) ?? ( = {0,1}? ) {n}? {n,}? {m,n}? POSSESSIVE quantifiers : *+ ( = {0,}+ ) ++ ( = {1,}+ ) ?+ ( = {0,1}+ ) {n}+ {n,}+ {m,n}+
Remark : The two
{n}?
and{n}+
syntaxes, although correct, are useless, as the syntax{n}
could be qualified as anEXACT
quantifier !Best Regards,
guy038
-