RegEx command to delete string with variable numbers
-
Hello everyone,
i am looking for the proper RegEx command in order to delete a recurrent string with variable numbers.
I want to delete the timestamps on a Summary Comment page from a PDF. For now i can do it manually one by one: exporting the FDF data, rename as XML, open it with Notepad++ and search for those strings:
code_text /CreationDate(D:20200605015359+02'00') /M(D:20200605015359+02'00') code_text
numbers are the timestamp variable.
and finally rename it as FDF.
In fact since dont know how to code stuff, if someone is kinda enough im looking for a script that do the same thing without open notepad++.
Thanks in advance
-
You’re probably going to want to show some “after” text sample as well.
The way I read what you want is that you’d end up with:code_text code_text
which I’m 99% certain isn’t what you want.
-
Thanks for your quick response.
Let me clarify what i need.
In Adobe DC it is possible to create a file with all comments written on the pdf documents, it’s called Comment summary:
https://helpx.adobe.com/acrobat/kb/print-comments-acrobat-reader.htmlWhat i want is delete the comment timestamps on that file. It is possible exporting the comments metadata as .fdf file.
Change extension to .xml and it looks something like this:
Now for every comment delete the two strings:
/CreationDate(D:20200605015359+02’00’)
/M(D:20200605015359+02’00’)The numbers are the timestamp that change everytime.
The result should be like this:
For now i do it manually, and i would ask the proper RegEx search line for searching all those two string and replace them with nothing, ie delete them.
Since i have many pdf with hundreds comments, it would be nice if someone helps me writing a script that do the same job without replace them in notepad++.
Thanks again
-
I’d say this regex could match your situation:
(?:/CreationDate|/M)\(D:\d{14}\+02'00'\)
It seems like the
+02'00'
is constant, but if it is variable, we can deal with that as well. -
Hello, @paul-smithers and All,
A
regex
search/replacement could be :SEARCH
(?-i)(?:(/CreationDate)|M)\(D:\d{14}\+02'00'\)/
REPLACE
\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20(?1\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20/)
And here are the changes :
BEFORE : <</C[1.0 0.819611 0.0]/CreationDate(D:20200606114426+02'00')/F 28/M(D:20200606114426+02'00')/NM... AFTER : <</C[1.0 0.819611 0.0] /F 28/ NM...
As @alan-kilborn said, if the string
+02'00'
is not constant, change the search regex as below :SEARCH
(?-i)(?:(/CreationDate)|M)\(D:\d{14}.{7}\)/
Best Regards,
guy038
-
Hi Guy,
Is there a way to get the number of spaces to use for the replacement, from the length of the original match? -
Hi, @paul-smithers, @Alan-kilborn and All,
Yeaaaah ! Indeed, there is a method ;-))
I thought about the very basic replacement of each single standard char(
.
) with a space char (\x20
)But we need to replace text with spaces, in some zones only, not everywhere ! To achieve such a task, we’ll use a new feature of our regex engine, since Notepad++
v7.7
: thebacktracking control verbs
! Why this idea came to my mind ? Well, just because I’m preparing a documentation on these zero-width assertions !Fundamentally, the goal is to use this generic regex, below :
^What we do NOT want to match(*SKIP)((*F)|what we WANT to match, delimited with a LOOK-AHEAD|Again, what we do NOT want to match(*SKIP)(*F)|Again,what we WANT to match, delimited by an other LOOK-AHEAD|....
and so onAlan, could you be patient till I build up and post this documentation about these
backtracking control verbs
?Meanwhile, you’ll find some hints, here :
https://www.rexegg.com/backtracking-control-verbs.html#skipfail
A little practice :
Assuming the initial and final text, desired by @paul-smithers
BEFORE : <</C[1.0 0.819611 0.0]/CreationDate(D:20200606114426+02'00')/F 28/M(D:20200606114426+02'00')/NM... AFTER : <</C[1.0 0.819611 0.0] /F 28/ NM...
We can tell that :
-
First, text, from
beginning
of line till a]
is unwanted -
Then, text, till the string
/F
, is wanted and, for each single char in this zone, we want to replace it with a space char -
Now, the text
/F 28/
is unwanted -
Finally, text till the string
NM
is also wanted and again, for each single char in this zone, we want to replace it with a space char
So, look how easy it is to build up the search regex, from the points above ! In addition, I’ll use the
free spacing
mode for a better readabilitySEARCH
(?x-s) ^.+\] (*SKIP)(*F) | (?=.+/F) . | /F\x2028/ (*SKIP)(*F) | (?=.+NM) .
REPLACE
\x20
We get :
Text of @paul-smithers : BEFORE : <</C[1.0 0.819611 0.0]/CreationDate(D:20200606114426+02'00')/F 28/M(D:20200606114426+02'00')/NM... AFTER : <</C[1.0 0.819611 0.0] /F 28/ NM... Other TESTS : BEFORE : [1.0 0.819611 0.0]/CreationDate(D:20200606114426)/F 28/M(D:20200606+02'00')/NM... AFTER : [1.0 0.819611 0.0] /F 28/ NM... BEFORE : [1.0 0.819611]/CreationDate(+02'00')/F 28/M(D:114426+02)/NM... AFTER : [1.0 0.819611] /F 28/ NM...
Magic, isn’t it ;-))
Notes :
-
Beware of the final dot, after the two positive look-aheads !
-
Of course, in case of an huge file, problem of performance may occurs, as each single character is replaced with a space !
-
Note, also, that the use of the
\K
feature would not give the same behavior. Indeed, in that case, the part after\K
( the.
) must come, necessarily, right after\K
, because this regex contains2
alternatives only, unlike the4
alternatives of the former regex ! Just try it :
SEARCH
(?-s)^.+\]\K(?=.+/F).|/F 28/\K(?=.+NM).
Cheers,
guy038
-
-
@paul-smithers, @Alan-kilborn and All,
I guess I must have been influenced by my upcoming documentation on
Backtracking control verbs
!In fact, be reassured, there is still a classical solution, which does not use this new feature. Here it is this second solution, written with the free-spacing mode
(?x)
:SEARCH
(?x-s) (^.+\]) | (?=.+/F) (.) | (/F\x2028/) | (?=.+NM) (.)
REPLACE
(?1$0)(?2\x20)(?3$0)(?4\x20)
As you can see :
-
Any part, that we do not want to match, is simply rewritten (
$0
) -
In zones, that we do care of, each single standard character (
.
) is replaced with a space char (\x20
)
BR
guy038
-
-
Hello, first of all thanks everyone for the help.
I have tried the first proposal (?:/CreationDate|/M)(D:\d{14}+02’00’) from @Alan-kilborn and it works perfectly since i dont need the \x20 space char.
For some reason if i use the other proposals, Acrobat refuse to import the modified fdf file because an unspecified error.
Anyway, i resolved my problem now. I use this script to remove the autor name https://adobe.ly/3emVRkC and the search RegEx for the timestamp.
Thanks again.