Community
    • Login

    RegEx command to delete string with variable numbers

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    9 Posts 3 Posters 1.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Paul smithersP Offline
      Paul smithers
      last edited by

      Hello everyone,

      i am looking for the proper RegEx command in order to delete a recurrent string with variable numbers.

      I want to delete the timestamps on a Summary Comment page from a PDF. For now i can do it manually one by one: exporting the FDF data, rename as XML, open it with Notepad++ and search for those strings:

      code_text
      /CreationDate(D:20200605015359+02'00') 
      /M(D:20200605015359+02'00')
      code_text
      

      numbers are the timestamp variable.

      and finally rename it as FDF.

      In fact since dont know how to code stuff, if someone is kinda enough im looking for a script that do the same thing without open notepad++.

      Thanks in advance

      Alan KilbornA 1 Reply Last reply Reply Quote 0
      • Alan KilbornA Offline
        Alan Kilborn @Paul smithers
        last edited by

        @Paul-smithers

        You’re probably going to want to show some “after” text sample as well.
        The way I read what you want is that you’d end up with:

        code_text
        code_text
        

        which I’m 99% certain isn’t what you want.

        1 Reply Last reply Reply Quote 0
        • Paul smithersP Offline
          Paul smithers
          last edited by Paul smithers

          Thanks for your quick response.

          Let me clarify what i need.

          In Adobe DC it is possible to create a file with all comments written on the pdf documents, it’s called Comment summary:
          https://helpx.adobe.com/acrobat/kb/print-comments-acrobat-reader.html

          What i want is delete the comment timestamps on that file. It is possible exporting the comments metadata as .fdf file.

          Change extension to .xml and it looks something like this:

          timestamptest3.jpg

          Now for every comment delete the two strings:

          /CreationDate(D:20200605015359+02’00’)
          /M(D:20200605015359+02’00’)

          The numbers are the timestamp that change everytime.

          The result should be like this:

          timestamptest4.jpg

          For now i do it manually, and i would ask the proper RegEx search line for searching all those two string and replace them with nothing, ie delete them.

          Since i have many pdf with hundreds comments, it would be nice if someone helps me writing a script that do the same job without replace them in notepad++.

          Thanks again

          Alan KilbornA 1 Reply Last reply Reply Quote 0
          • Alan KilbornA Offline
            Alan Kilborn @Paul smithers
            last edited by

            @Paul-smithers

            I’d say this regex could match your situation:

            (?:/CreationDate|/M)\(D:\d{14}\+02'00'\)

            It seems like the +02'00' is constant, but if it is variable, we can deal with that as well.

            1 Reply Last reply Reply Quote 2
            • guy038G Offline
              guy038
              last edited by guy038

              Hello, @paul-smithers and All,

              A regex search/replacement could be :

              SEARCH (?-i)(?:(/CreationDate)|M)\(D:\d{14}\+02'00'\)/

              REPLACE \x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20(?1\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20/)

              And here are the changes :

              BEFORE : <</C[1.0 0.819611 0.0]/CreationDate(D:20200606114426+02'00')/F 28/M(D:20200606114426+02'00')/NM...
              AFTER  : <</C[1.0 0.819611 0.0]                                      /F 28/                           NM...
              

              As @alan-kilborn said, if the string +02'00' is not constant, change the search regex as below :

              SEARCH (?-i)(?:(/CreationDate)|M)\(D:\d{14}.{7}\)/

              Best Regards,

              guy038

              Alan KilbornA 1 Reply Last reply Reply Quote 2
              • Alan KilbornA Offline
                Alan Kilborn @guy038
                last edited by Alan Kilborn

                @guy038

                Hi Guy,
                Is there a way to get the number of spaces to use for the replacement, from the length of the original match?

                1 Reply Last reply Reply Quote 1
                • guy038G Offline
                  guy038
                  last edited by guy038

                  Hi, @paul-smithers, @Alan-kilborn and All,

                  Yeaaaah ! Indeed, there is a method ;-))

                  I thought about the very basic replacement of each single standard char( . ) with a space char ( \x20 )

                  But we need to replace text with spaces, in some zones only, not everywhere ! To achieve such a task, we’ll use a new feature of our regex engine, since Notepad++ v7.7 : the backtracking control verbs ! Why this idea came to my mind ? Well, just because I’m preparing a documentation on these zero-width assertions !

                  Fundamentally, the goal is to use this generic regex, below :

                  ^What we do NOT want to match(*SKIP)((*F)|what we WANT to match, delimited with a LOOK-AHEAD|Again, what we do NOT want to match(*SKIP)(*F)|Again,what we WANT to match, delimited by an other LOOK-AHEAD|.... and so on

                  Alan, could you be patient till I build up and post this documentation about these backtracking control verbs ?

                  Meanwhile, you’ll find some hints, here :

                  https://www.rexegg.com/backtracking-control-verbs.html#skipfail


                  A little practice :

                  Assuming the initial and final text, desired by @paul-smithers

                  BEFORE : <</C[1.0 0.819611 0.0]/CreationDate(D:20200606114426+02'00')/F 28/M(D:20200606114426+02'00')/NM...
                  AFTER  : <</C[1.0 0.819611 0.0]                                      /F 28/                           NM...
                  

                  We can tell that :

                  • First, text, from beginning of line till a ] is unwanted

                  • Then, text, till the string /F, is wanted and, for each single char in this zone, we want to replace it with a space char

                  • Now, the text /F 28/ is unwanted

                  • Finally, text till the string NM is also wanted and again, for each single char in this zone, we want to replace it with a space char


                  So, look how easy it is to build up the search regex, from the points above ! In addition, I’ll use the free spacing mode for a better readability

                  SEARCH (?x-s) ^.+\] (*SKIP)(*F) | (?=.+/F) . | /F\x2028/ (*SKIP)(*F) | (?=.+NM) .

                  REPLACE \x20

                  We get :

                  
                  Text of @paul-smithers :
                  
                  BEFORE : <</C[1.0 0.819611 0.0]/CreationDate(D:20200606114426+02'00')/F 28/M(D:20200606114426+02'00')/NM...
                  AFTER  : <</C[1.0 0.819611 0.0]                                      /F 28/                           NM...
                  
                  Other TESTS :
                  
                  BEFORE : [1.0 0.819611 0.0]/CreationDate(D:20200606114426)/F 28/M(D:20200606+02'00')/NM...
                  AFTER  : [1.0 0.819611 0.0]                               /F 28/                     NM...
                  
                  
                  BEFORE : [1.0 0.819611]/CreationDate(+02'00')/F 28/M(D:114426+02)/NM...
                  AFTER  : [1.0 0.819611]                      /F 28/               NM...
                  
                  

                  Magic, isn’t it ;-))


                  Notes :

                  • Beware of the final dot, after the two positive look-aheads !

                  • Of course, in case of an huge file, problem of performance may occurs, as each single character is replaced with a space !

                  • Note, also, that the use of the \K feature would not give the same behavior. Indeed, in that case, the part after \K ( the . ) must come, necessarily, right after \K, because this regex contains 2 alternatives only, unlike the 4 alternatives of the former regex ! Just try it :

                  SEARCH (?-s)^.+\]\K(?=.+/F).|/F 28/\K(?=.+NM).

                  Cheers,

                  guy038

                  1 Reply Last reply Reply Quote 3
                  • guy038G Offline
                    guy038
                    last edited by

                    @paul-smithers, @Alan-kilborn and All,

                    I guess I must have been influenced by my upcoming documentation on Backtracking control verbs !

                    In fact, be reassured, there is still a classical solution, which does not use this new feature. Here it is this second solution, written with the free-spacing mode (?x) :

                    SEARCH (?x-s) (^.+\]) | (?=.+/F) (.) | (/F\x2028/) | (?=.+NM) (.)

                    REPLACE (?1$0)(?2\x20)(?3$0)(?4\x20)

                    As you can see :

                    • Any part, that we do not want to match, is simply rewritten ( $0 )

                    • In zones, that we do care of, each single standard character ( . ) is replaced with a space char ( \x20 )

                    BR

                    guy038

                    1 Reply Last reply Reply Quote 3
                    • Paul smithersP Offline
                      Paul smithers
                      last edited by

                      Hello, first of all thanks everyone for the help.

                      I have tried the first proposal (?:/CreationDate|/M)(D:\d{14}+02’00’) from @Alan-kilborn and it works perfectly since i dont need the \x20 space char.

                      For some reason if i use the other proposals, Acrobat refuse to import the modified fdf file because an unspecified error.

                      Anyway, i resolved my problem now. I use this script to remove the autor name https://adobe.ly/3emVRkC and the search RegEx for the timestamp.

                      Thanks again.

                      1 Reply Last reply Reply Quote 0

                      Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                      Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                      With your input, this post could be even better 💗

                      Register Login
                      • First post
                        Last post
                      The Community of users of the Notepad++ text editor.
                      Powered by NodeBB | Contributors