Community
    • Login

    RegEx command to delete string with variable numbers

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    9 Posts 3 Posters 633 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Paul smithersP
      Paul smithers
      last edited by

      Hello everyone,

      i am looking for the proper RegEx command in order to delete a recurrent string with variable numbers.

      I want to delete the timestamps on a Summary Comment page from a PDF. For now i can do it manually one by one: exporting the FDF data, rename as XML, open it with Notepad++ and search for those strings:

      code_text
      /CreationDate(D:20200605015359+02'00') 
      /M(D:20200605015359+02'00')
      code_text
      

      numbers are the timestamp variable.

      and finally rename it as FDF.

      In fact since dont know how to code stuff, if someone is kinda enough im looking for a script that do the same thing without open notepad++.

      Thanks in advance

      Alan KilbornA 1 Reply Last reply Reply Quote 0
      • Alan KilbornA
        Alan Kilborn @Paul smithers
        last edited by

        @Paul-smithers

        You’re probably going to want to show some “after” text sample as well.
        The way I read what you want is that you’d end up with:

        code_text
        code_text
        

        which I’m 99% certain isn’t what you want.

        1 Reply Last reply Reply Quote 0
        • Paul smithersP
          Paul smithers
          last edited by Paul smithers

          Thanks for your quick response.

          Let me clarify what i need.

          In Adobe DC it is possible to create a file with all comments written on the pdf documents, it’s called Comment summary:
          https://helpx.adobe.com/acrobat/kb/print-comments-acrobat-reader.html

          What i want is delete the comment timestamps on that file. It is possible exporting the comments metadata as .fdf file.

          Change extension to .xml and it looks something like this:

          timestamptest3.jpg

          Now for every comment delete the two strings:

          /CreationDate(D:20200605015359+02’00’)
          /M(D:20200605015359+02’00’)

          The numbers are the timestamp that change everytime.

          The result should be like this:

          timestamptest4.jpg

          For now i do it manually, and i would ask the proper RegEx search line for searching all those two string and replace them with nothing, ie delete them.

          Since i have many pdf with hundreds comments, it would be nice if someone helps me writing a script that do the same job without replace them in notepad++.

          Thanks again

          Alan KilbornA 1 Reply Last reply Reply Quote 0
          • Alan KilbornA
            Alan Kilborn @Paul smithers
            last edited by

            @Paul-smithers

            I’d say this regex could match your situation:

            (?:/CreationDate|/M)\(D:\d{14}\+02'00'\)

            It seems like the +02'00' is constant, but if it is variable, we can deal with that as well.

            1 Reply Last reply Reply Quote 2
            • guy038G
              guy038
              last edited by guy038

              Hello, @paul-smithers and All,

              A regex search/replacement could be :

              SEARCH (?-i)(?:(/CreationDate)|M)\(D:\d{14}\+02'00'\)/

              REPLACE \x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20(?1\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20/)

              And here are the changes :

              BEFORE : <</C[1.0 0.819611 0.0]/CreationDate(D:20200606114426+02'00')/F 28/M(D:20200606114426+02'00')/NM...
              AFTER  : <</C[1.0 0.819611 0.0]                                      /F 28/                           NM...
              

              As @alan-kilborn said, if the string +02'00' is not constant, change the search regex as below :

              SEARCH (?-i)(?:(/CreationDate)|M)\(D:\d{14}.{7}\)/

              Best Regards,

              guy038

              Alan KilbornA 1 Reply Last reply Reply Quote 2
              • Alan KilbornA
                Alan Kilborn @guy038
                last edited by Alan Kilborn

                @guy038

                Hi Guy,
                Is there a way to get the number of spaces to use for the replacement, from the length of the original match?

                1 Reply Last reply Reply Quote 1
                • guy038G
                  guy038
                  last edited by guy038

                  Hi, @paul-smithers, @Alan-kilborn and All,

                  Yeaaaah ! Indeed, there is a method ;-))

                  I thought about the very basic replacement of each single standard char( . ) with a space char ( \x20 )

                  But we need to replace text with spaces, in some zones only, not everywhere ! To achieve such a task, we’ll use a new feature of our regex engine, since Notepad++ v7.7 : the backtracking control verbs ! Why this idea came to my mind ? Well, just because I’m preparing a documentation on these zero-width assertions !

                  Fundamentally, the goal is to use this generic regex, below :

                  ^What we do NOT want to match(*SKIP)((*F)|what we WANT to match, delimited with a LOOK-AHEAD|Again, what we do NOT want to match(*SKIP)(*F)|Again,what we WANT to match, delimited by an other LOOK-AHEAD|.... and so on

                  Alan, could you be patient till I build up and post this documentation about these backtracking control verbs ?

                  Meanwhile, you’ll find some hints, here :

                  https://www.rexegg.com/backtracking-control-verbs.html#skipfail


                  A little practice :

                  Assuming the initial and final text, desired by @paul-smithers

                  BEFORE : <</C[1.0 0.819611 0.0]/CreationDate(D:20200606114426+02'00')/F 28/M(D:20200606114426+02'00')/NM...
                  AFTER  : <</C[1.0 0.819611 0.0]                                      /F 28/                           NM...
                  

                  We can tell that :

                  • First, text, from beginning of line till a ] is unwanted

                  • Then, text, till the string /F, is wanted and, for each single char in this zone, we want to replace it with a space char

                  • Now, the text /F 28/ is unwanted

                  • Finally, text till the string NM is also wanted and again, for each single char in this zone, we want to replace it with a space char


                  So, look how easy it is to build up the search regex, from the points above ! In addition, I’ll use the free spacing mode for a better readability

                  SEARCH (?x-s) ^.+\] (*SKIP)(*F) | (?=.+/F) . | /F\x2028/ (*SKIP)(*F) | (?=.+NM) .

                  REPLACE \x20

                  We get :

                  
                  Text of @paul-smithers :
                  
                  BEFORE : <</C[1.0 0.819611 0.0]/CreationDate(D:20200606114426+02'00')/F 28/M(D:20200606114426+02'00')/NM...
                  AFTER  : <</C[1.0 0.819611 0.0]                                      /F 28/                           NM...
                  
                  Other TESTS :
                  
                  BEFORE : [1.0 0.819611 0.0]/CreationDate(D:20200606114426)/F 28/M(D:20200606+02'00')/NM...
                  AFTER  : [1.0 0.819611 0.0]                               /F 28/                     NM...
                  
                  
                  BEFORE : [1.0 0.819611]/CreationDate(+02'00')/F 28/M(D:114426+02)/NM...
                  AFTER  : [1.0 0.819611]                      /F 28/               NM...
                  
                  

                  Magic, isn’t it ;-))


                  Notes :

                  • Beware of the final dot, after the two positive look-aheads !

                  • Of course, in case of an huge file, problem of performance may occurs, as each single character is replaced with a space !

                  • Note, also, that the use of the \K feature would not give the same behavior. Indeed, in that case, the part after \K ( the . ) must come, necessarily, right after \K, because this regex contains 2 alternatives only, unlike the 4 alternatives of the former regex ! Just try it :

                  SEARCH (?-s)^.+\]\K(?=.+/F).|/F 28/\K(?=.+NM).

                  Cheers,

                  guy038

                  1 Reply Last reply Reply Quote 3
                  • guy038G
                    guy038
                    last edited by

                    @paul-smithers, @Alan-kilborn and All,

                    I guess I must have been influenced by my upcoming documentation on Backtracking control verbs !

                    In fact, be reassured, there is still a classical solution, which does not use this new feature. Here it is this second solution, written with the free-spacing mode (?x) :

                    SEARCH (?x-s) (^.+\]) | (?=.+/F) (.) | (/F\x2028/) | (?=.+NM) (.)

                    REPLACE (?1$0)(?2\x20)(?3$0)(?4\x20)

                    As you can see :

                    • Any part, that we do not want to match, is simply rewritten ( $0 )

                    • In zones, that we do care of, each single standard character ( . ) is replaced with a space char ( \x20 )

                    BR

                    guy038

                    1 Reply Last reply Reply Quote 3
                    • Paul smithersP
                      Paul smithers
                      last edited by

                      Hello, first of all thanks everyone for the help.

                      I have tried the first proposal (?:/CreationDate|/M)(D:\d{14}+02’00’) from @Alan-kilborn and it works perfectly since i dont need the \x20 space char.

                      For some reason if i use the other proposals, Acrobat refuse to import the modified fdf file because an unspecified error.

                      Anyway, i resolved my problem now. I use this script to remove the autor name https://adobe.ly/3emVRkC and the search RegEx for the timestamp.

                      Thanks again.

                      1 Reply Last reply Reply Quote 0
                      • First post
                        Last post
                      The Community of users of the Notepad++ text editor.
                      Powered by NodeBB | Contributors