replace / move numbers from one row to another with regular expression (in parentheses)
-
good day, I have this 2 files with this lines:
----- File 1.html—
<li><a href="page-1.html" title="Page 1">Page 1 (22)</a></li> <li><a href="page-2.html" title="Page 2">Page 2 (18)</a></li> <li><a href="page-3.html" title="Page 3">Page 3 (23)</a></li>
----- File 2.html—
<li><a href="pagion-1.html" title="Pagion 1">Pagion 1 (2)</a></li> <li><a href="pagion-2.html" title="Pagion 2">Pagion 2 (12)</a></li> <li><a href="pagion-3.html" title="Pagion 3">Pagion 3 (10)</a></li>
I need to replace and move numbers from one row to another with regular expression (in parentheses), from a file to another, in order to have:
----- File 1.html—
<li><a href="page-1.html" title="Page 1">Page 1 (22)</a></li> <li><a href="page-2.html" title="Page 2">Page 2 (18)</a></li> <li><a href="page-3.html" title="Page 3">Page 3 (23)</a></li>
----- File 2.html—
<li><a href="pagion-1.html" title="Pagion 1">Pagion 1 (22)</a></li> <li><a href="pagion-2.html" title="Pagion 2">Pagion 2 (18)</a></li> <li><a href="pagion-3.html" title="Pagion 3">Pagion 3 (23)</a></li>
can this be done?
-
@Robin-Cruise said in replace / move numbers from one row to another with regular expression (in parentheses):
I need to replace and move numbers from one row to another with regular expression (in parentheses), from a file to another
So from your example it appears that file 1 isn’t changed, only file 2.
And I can see that file 2 lines have different wording. How do you determine which line in file 2 gets which number? Is it that the number in line 1 of file 1 gets copied to file 2 line 1? Or do you have some other rule used to determine which. To make a solution we need a rule to follow, something that can be translated into either a regular expression or some other function if a scripted solution.Terry
-
@Terry-R said in replace / move numbers from one row to another with regular expression (in parentheses):
So from your example it appears that file 1 isn’t changed, only file 2.
And I can see that file 2 lines have different wording. How do you determine which line in file 2 gets which number? Is it that the number in line 1 of file 1 gets copied to file 2 line 1? Or do you have some other rule used to determine which. To make a solution we need a rule to follow, something that can be translated into either a regular expression or some other function if a scripted solution.I think of something. First, I will select all numbers from parentheses (including parentheses)
SEARCH:
.*?(\(.*?\)).*
REPLACE BY:
\1\2
So, I will get:
(22) (18) (23) (2) (12) (10)
Now, all I have to do is copy the first 3 rows instead of the next 3 rows to get:
(22) (18) (23) (22) (18) (23)
The problem is, how do I copy the parentheses and numbers back to their place between tags?
-
@Robin-Cruise said in replace / move numbers from one row to another with regular expression (in parentheses):
The problem is, how do I copy the parentheses and numbers back to their place between tags?
Not sure where you are heading but you still haven’t answered my question.
I’m going to take a punt since it does seem a bit obvious from the examples:
- both files contain the same number of lines and are in the same order, line 1 of file 1 corresponds with line 1 of file 2.
- I’d prefix each line in each file with an ascending number 1,2,3. I’d also prefix (behind the number) and “a” for file 1 and a “b” for file 2.
- I’d then combine both files and sort lexicographically.
- I’d use a regex to copy the number from #a line and replace the equivalent number in the #b line, at the same time removing the #a line (since we don’t need the file 1 line anymore).
- I’d then remove the prefix from the lines (#b) leaving the changed file 2 content.
Terry
-
ok, I made a regex, it is kind a step forward. I managed to move the numbers and parentheses from the first 3 rows to the next 3, but not in the correct order.
SEARCH:
(?s)(<li><a href=)(.*?)(\(\d+\))(<\/a><\/li>).*?\K(\w+)
REPLACE BY:
\3
if I could do that, it’s a sign that it’s somehow possible. but I am not very good at regex. Maybe @guy038 will improve my regex . :)
-
Hi, @robin-cruise, @terry-r and All,
Sorry, but a significant lot of information is missing for a good comprehension of your goal :
-
How many lines
<li><a href="page-##.html" title="Page ##">Page ## (##)</a></li>
contains yourFile 1.html
file ? -
Are all these lines consecutive ?
-
Even if some of these lines are consecutive, are there some other similar sections, containing this same type of lines ?
-
Does the
File 2.html
file contains the same number of lines<li><a href="page-##.html" title="Page ##">Page ## (##)</a></li>
than theFile 1.html
file ? -
If the
File 2.html
file contains also some sections of these lines, is the layout quite identical, between the two files ?
In short, could you provide a larger part of your files to get a more precise idea of the changes to do ?
Best Regards,
guy038
-
-
-
both of them, File 1.html just like File 2.html contains 40 lines. both files have the same structure, except the numbers in parentheses.
-
The numbers (in parentheses) are different on both files, but I want them to be the same. Right no, they are not consecutive numbers, but random ones.
-
page-1.html" title=“Page 1”>Page 1 is a short version, just an example.
The real pages are like: <li><a href="Love-Master-A-Manga-Volume.html title=“Love Master A Manga Volume”>Love Master A Manga Volume (22)</a></li>
Basicaly, it’s about a meniu on a website translate in 2 languages, and that number in parenthesis , ex. (22), is the number of the articles on that section. Should be the same numbers in both languages
-
-
@Robin-Cruise said in replace / move numbers from one row to another with regular expression (in parentheses):
both of them, File 1.html just like File 2.html contains 40 lines.
Did you ever read my questions? Specifically I asked whether the same number of lines in each. Also is line 1 in file 1 equal to line 1 in file 2, thus the line 1 number (##) copied across to line 1 in file 2.
If so then I have already provided the solution in words (just needs translating into code), which I wrote out for you. Since you seem to have a good idea on how to create regexes, did you not try to follow my instructions?
Tery
-
@Robin-Cruise said in replace / move numbers from one row to another with regular expression (in parentheses):
- both of them, File 1.html just like File 2.html contains 40 lines. both files have the same structure, except the numbers in parentheses.
[…]
The real pages are like: <li><a href="Love-Master-A-Manga-Volume.html title=“Love Master A Manga Volume”>Love Master A Manga Volume (22)</a></li>
Given these condictions and if you are allowed to install the LuaScript Plugin, there is a script than can select, copy the numbers in parentheses from one page and paste them in the other.
Please confirm and will provide further details.
Take care and have fun!
- both of them, File 1.html just like File 2.html contains 40 lines. both files have the same structure, except the numbers in parentheses.
-
sorry for my objection , but it looks easier to just change the wording of the file2 to make it fit file1 and keep the numbers untouched . so file 1 would get replaced and translated by file2 which is the right one concerning the numbers. idk
-
this solved the problem.
In short, I copy all text before the parentheses from File 1 into the column A from EXCEL. Then I copy the parentheses with numbers from File 2 into the column B from Excel, put the rest after the parentheses into the column C from Excel. Then select all Excel columns into FILE 2, and replace line. Now the parentheses are identical.
Step 1 - Use regex to select all parentheses (with numbers), then copy them to an excel in column 2
SEARCH:
.*?(\(.*?\)).*
REPLACE BY:\1\2
Step 2 Use regex to select everything on each line before the parentheses:
SEARCH:
\(.*\).*
REPLACE BY:(leave empty)
Step 3 - Copy the resulting lines to an excel file in column 1
Step 4 - Copy directly to column 4 of excel what is after parentheses:
</a></li>
or use regex to obtain this result
</a></li>
, select everything after round brackets SEARCH:^(.*\)).*
REPLACE BY:\1
Step 5 - Copy all excel content to a new notepad ++ file.
If there are too many empty spaces, search and replace 2 spaces with one space
-
Hello, @robin-cruise, @terry-R, @astrosofista, @carypt and All,
Sorry to be very late, as I answered to many posts, recently !
Here is my method :
-
Open your
File 1.html
andFile 2.html
files in N++ -
At the end of the
File 1.html
contents, insert, for instance, a new line=====
-
Append the
File 2.html
contents, right after that new line
Note that we’ll need two specific characters, which are not used yet in your
HTML
files :-
One char to separate the contents of the two files, in *
File 1.html
**. I chose the=
sign. Hence the line of five=
signs -
One char used by the regex S/R in order to mark the numbers between parentheses already processed. I chose the
#
character -
Of course, you may choose any character for these two specific chars. Just modify the regex, accordingly
-
Preferably, avoid the true regex symbols
\ ^ $ . | ? * + ( ) [ ] { }
- For instance, after merging the
File 2.html
contents intoFile 1.html
, we would obtain this tiny text, with the=====
separation
<li><a href="Love-Master-A-Manga-Volume.html title=“Love Master A Manga Volume”>Love Master A Manga Volume (22)</a></li> bla bla bla bla bla bla bla bla <li><a href="Love-Master-A-Manga-Volume.html title=“Love Master A Manga Volume”>Love Master A Manga Volume (18)</a></li> bla bla <li><a href="Love-Master-A-Manga-Volume.html title=“Love Master A Manga Volume”>Love Master A Manga Volume (23)</a></li> ===== <li><a href="Love-Master-A-Manga-Volume.html title=“Love Master A Manga Volume”>Love Master A Manga Volume (2)</a></li> bla bla <li><a href="Love-Master-A-Manga-Volume.html title=“Love Master A Manga Volume”>Love Master A Manga Volume (12)</a></li> bla bla bla bla bla bla bla bla <li><a href="Love-Master-A-Manga-Volume.html title=“Love Master A Manga Volume”>Love Master A Manga Volume (10)</a></li>
-
Move to the very beginning of
File 1.html
-
Open the Replace dialog (
Ctrl + H
)-
SEARCH
(?s)\((\d+)\)(.+=====.+?)\((\d+)\)|^=====.+|#(?!.*^===)
-
REPLACE
?1\(\3#\)\2\(\3#\)
-
Untick, if necessary, the
Wrap around
option -
Select the
Regular expression
search mode
-
-
Now, keeping the Replace dialog opened, click on the
Replace All
button ( or preferably hit theAlt + A
shortcut ) repeatedly, until the messageReplace All: 0 occurrences were replaced from caret to end-of-file
is displayed !
And you’ll get the expected
File 1.html
contents :<li><a href="Love-Master-A-Manga-Volume.html title=“Love Master A Manga Volume”>Love Master A Manga Volume (2)</a></li> bla bla bla bla bla bla bla bla <li><a href="Love-Master-A-Manga-Volume.html title=“Love Master A Manga Volume”>Love Master A Manga Volume (12)</a></li> bla bla <li><a href="Love-Master-A-Manga-Volume.html title=“Love Master A Manga Volume”>Love Master A Manga Volume (10)</a></li>
- Save the new
File 1.html
contents, with all the updated numbers between parentheses !
Notes :
-
For
N
Replace All operations processed, in totality :-
The
N - 2
first operations :-
Replace the numbers of
File 1.html
with the corresponding numbers ofFile 2.html
, located after the=====
line -
Add a
#
marker to the two numbers processed
-
-
The
N - 1
operation deletes from line=====
till the very end of file, in order to suppress the temporary appended part -
The
N
operation deletes all the existing#
markers, of theFile 1.html
-
Best Regards,
guy038
-
-
@guy038 said in replace / move numbers from one row to another with regular expression (in parentheses):
=====
brilliant. You really are very good @guy038
THANK YOU !
-
@guy038 said in replace / move numbers from one row to another with regular expression (in parentheses):
?1(\3#)\2(\3#)
by the way, on replace, what does it mean
?1\(\3#\)\2\(\3#\)
(step by step, please) ? -
@Robin-Cruise said in replace / move numbers from one row to another with regular expression (in parentheses):
by the way, on replace, what does it mean ?1(\3#)\2(\3#) (step by step, please) ?
It’s fairly “easy”: :-)
?1
controls the rest of it: If capture group #1 was NOT matched, the replacement is “nothing” (aka deletion)If capture group #1 WAS matched, then the replacement consists of:
- opening parens:
(
- what was matched with capture group #3
- a literal
#
- closing parens:
)
- what was matched with capture group #2
- opening parens:
(
- what was matched with capture group #3
- a literal
#
- closing parens:
)
- opening parens: