RegEx command to delete string with variable numbers
-
Hello everyone,
i am looking for the proper RegEx command in order to delete a recurrent string with variable numbers.
I want to delete the timestamps on a Summary Comment page from a PDF. For now i can do it manually one by one: exporting the FDF data, rename as XML, open it with Notepad++ and search for those strings:
code_text /CreationDate(D:20200605015359+02'00') /M(D:20200605015359+02'00') code_textnumbers are the timestamp variable.
and finally rename it as FDF.
In fact since dont know how to code stuff, if someone is kinda enough im looking for a script that do the same thing without open notepad++.
Thanks in advance
-
You’re probably going to want to show some “after” text sample as well.
The way I read what you want is that you’d end up with:code_text code_textwhich I’m 99% certain isn’t what you want.
-
Thanks for your quick response.
Let me clarify what i need.
In Adobe DC it is possible to create a file with all comments written on the pdf documents, it’s called Comment summary:
https://helpx.adobe.com/acrobat/kb/print-comments-acrobat-reader.htmlWhat i want is delete the comment timestamps on that file. It is possible exporting the comments metadata as .fdf file.
Change extension to .xml and it looks something like this:

Now for every comment delete the two strings:
/CreationDate(D:20200605015359+02’00’)
/M(D:20200605015359+02’00’)The numbers are the timestamp that change everytime.
The result should be like this:

For now i do it manually, and i would ask the proper RegEx search line for searching all those two string and replace them with nothing, ie delete them.
Since i have many pdf with hundreds comments, it would be nice if someone helps me writing a script that do the same job without replace them in notepad++.
Thanks again
-
I’d say this regex could match your situation:
(?:/CreationDate|/M)\(D:\d{14}\+02'00'\)It seems like the
+02'00'is constant, but if it is variable, we can deal with that as well. -
Hello, @paul-smithers and All,
A
regexsearch/replacement could be :SEARCH
(?-i)(?:(/CreationDate)|M)\(D:\d{14}\+02'00'\)/REPLACE
\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20(?1\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20/)And here are the changes :
BEFORE : <</C[1.0 0.819611 0.0]/CreationDate(D:20200606114426+02'00')/F 28/M(D:20200606114426+02'00')/NM... AFTER : <</C[1.0 0.819611 0.0] /F 28/ NM...As @alan-kilborn said, if the string
+02'00'is not constant, change the search regex as below :SEARCH
(?-i)(?:(/CreationDate)|M)\(D:\d{14}.{7}\)/Best Regards,
guy038
-
Hi Guy,
Is there a way to get the number of spaces to use for the replacement, from the length of the original match? -
Hi, @paul-smithers, @Alan-kilborn and All,
Yeaaaah ! Indeed, there is a method ;-))
I thought about the very basic replacement of each single standard char(
.) with a space char (\x20)But we need to replace text with spaces, in some zones only, not everywhere ! To achieve such a task, we’ll use a new feature of our regex engine, since Notepad++
v7.7: thebacktracking control verbs! Why this idea came to my mind ? Well, just because I’m preparing a documentation on these zero-width assertions !Fundamentally, the goal is to use this generic regex, below :
^What we do NOT want to match(*SKIP)((*F)|what we WANT to match, delimited with a LOOK-AHEAD|Again, what we do NOT want to match(*SKIP)(*F)|Again,what we WANT to match, delimited by an other LOOK-AHEAD|....and so onAlan, could you be patient till I build up and post this documentation about these
backtracking control verbs?Meanwhile, you’ll find some hints, here :
https://www.rexegg.com/backtracking-control-verbs.html#skipfail
A little practice :
Assuming the initial and final text, desired by @paul-smithers
BEFORE : <</C[1.0 0.819611 0.0]/CreationDate(D:20200606114426+02'00')/F 28/M(D:20200606114426+02'00')/NM... AFTER : <</C[1.0 0.819611 0.0] /F 28/ NM...We can tell that :
-
First, text, from
beginningof line till a]is unwanted -
Then, text, till the string
/F, is wanted and, for each single char in this zone, we want to replace it with a space char -
Now, the text
/F 28/is unwanted -
Finally, text till the string
NMis also wanted and again, for each single char in this zone, we want to replace it with a space char
So, look how easy it is to build up the search regex, from the points above ! In addition, I’ll use the
free spacingmode for a better readabilitySEARCH
(?x-s) ^.+\] (*SKIP)(*F) | (?=.+/F) . | /F\x2028/ (*SKIP)(*F) | (?=.+NM) .REPLACE
\x20We get :
Text of @paul-smithers : BEFORE : <</C[1.0 0.819611 0.0]/CreationDate(D:20200606114426+02'00')/F 28/M(D:20200606114426+02'00')/NM... AFTER : <</C[1.0 0.819611 0.0] /F 28/ NM... Other TESTS : BEFORE : [1.0 0.819611 0.0]/CreationDate(D:20200606114426)/F 28/M(D:20200606+02'00')/NM... AFTER : [1.0 0.819611 0.0] /F 28/ NM... BEFORE : [1.0 0.819611]/CreationDate(+02'00')/F 28/M(D:114426+02)/NM... AFTER : [1.0 0.819611] /F 28/ NM...Magic, isn’t it ;-))
Notes :
-
Beware of the final dot, after the two positive look-aheads !
-
Of course, in case of an huge file, problem of performance may occurs, as each single character is replaced with a space !
-
Note, also, that the use of the
\Kfeature would not give the same behavior. Indeed, in that case, the part after\K( the.) must come, necessarily, right after\K, because this regex contains2alternatives only, unlike the4alternatives of the former regex ! Just try it :
SEARCH
(?-s)^.+\]\K(?=.+/F).|/F 28/\K(?=.+NM).Cheers,
guy038
-
-
@paul-smithers, @Alan-kilborn and All,
I guess I must have been influenced by my upcoming documentation on
Backtracking control verbs!In fact, be reassured, there is still a classical solution, which does not use this new feature. Here it is this second solution, written with the free-spacing mode
(?x):SEARCH
(?x-s) (^.+\]) | (?=.+/F) (.) | (/F\x2028/) | (?=.+NM) (.)REPLACE
(?1$0)(?2\x20)(?3$0)(?4\x20)As you can see :
-
Any part, that we do not want to match, is simply rewritten (
$0) -
In zones, that we do care of, each single standard character (
.) is replaced with a space char (\x20)
BR
guy038
-
-
Hello, first of all thanks everyone for the help.
I have tried the first proposal (?:/CreationDate|/M)(D:\d{14}+02’00’) from @Alan-kilborn and it works perfectly since i dont need the \x20 space char.
For some reason if i use the other proposals, Acrobat refuse to import the modified fdf file because an unspecified error.
Anyway, i resolved my problem now. I use this script to remove the autor name https://adobe.ly/3emVRkC and the search RegEx for the timestamp.
Thanks again.