• Login
Community
  • Login

RegEx command to delete string with variable numbers

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
9 Posts 3 Posters 635 Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • P
    Paul smithers
    last edited by Jun 5, 2020, 4:08 PM

    Hello everyone,

    i am looking for the proper RegEx command in order to delete a recurrent string with variable numbers.

    I want to delete the timestamps on a Summary Comment page from a PDF. For now i can do it manually one by one: exporting the FDF data, rename as XML, open it with Notepad++ and search for those strings:

    code_text
    /CreationDate(D:20200605015359+02'00') 
    /M(D:20200605015359+02'00')
    code_text
    

    numbers are the timestamp variable.

    and finally rename it as FDF.

    In fact since dont know how to code stuff, if someone is kinda enough im looking for a script that do the same thing without open notepad++.

    Thanks in advance

    A 1 Reply Last reply Jun 5, 2020, 5:02 PM Reply Quote 0
    • A
      Alan Kilborn @Paul smithers
      last edited by Jun 5, 2020, 5:02 PM

      @Paul-smithers

      You’re probably going to want to show some “after” text sample as well.
      The way I read what you want is that you’d end up with:

      code_text
      code_text
      

      which I’m 99% certain isn’t what you want.

      1 Reply Last reply Reply Quote 0
      • P
        Paul smithers
        last edited by Paul smithers Jun 6, 2020, 10:52 AM Jun 6, 2020, 10:50 AM

        Thanks for your quick response.

        Let me clarify what i need.

        In Adobe DC it is possible to create a file with all comments written on the pdf documents, it’s called Comment summary:
        https://helpx.adobe.com/acrobat/kb/print-comments-acrobat-reader.html

        What i want is delete the comment timestamps on that file. It is possible exporting the comments metadata as .fdf file.

        Change extension to .xml and it looks something like this:

        timestamptest3.jpg

        Now for every comment delete the two strings:

        /CreationDate(D:20200605015359+02’00’)
        /M(D:20200605015359+02’00’)

        The numbers are the timestamp that change everytime.

        The result should be like this:

        timestamptest4.jpg

        For now i do it manually, and i would ask the proper RegEx search line for searching all those two string and replace them with nothing, ie delete them.

        Since i have many pdf with hundreds comments, it would be nice if someone helps me writing a script that do the same job without replace them in notepad++.

        Thanks again

        A 1 Reply Last reply Jun 6, 2020, 11:23 AM Reply Quote 0
        • A
          Alan Kilborn @Paul smithers
          last edited by Jun 6, 2020, 11:23 AM

          @Paul-smithers

          I’d say this regex could match your situation:

          (?:/CreationDate|/M)\(D:\d{14}\+02'00'\)

          It seems like the +02'00' is constant, but if it is variable, we can deal with that as well.

          1 Reply Last reply Reply Quote 2
          • G
            guy038
            last edited by guy038 Jun 6, 2020, 1:08 PM Jun 6, 2020, 1:07 PM

            Hello, @paul-smithers and All,

            A regex search/replacement could be :

            SEARCH (?-i)(?:(/CreationDate)|M)\(D:\d{14}\+02'00'\)/

            REPLACE \x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20(?1\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20/)

            And here are the changes :

            BEFORE : <</C[1.0 0.819611 0.0]/CreationDate(D:20200606114426+02'00')/F 28/M(D:20200606114426+02'00')/NM...
            AFTER  : <</C[1.0 0.819611 0.0]                                      /F 28/                           NM...
            

            As @alan-kilborn said, if the string +02'00' is not constant, change the search regex as below :

            SEARCH (?-i)(?:(/CreationDate)|M)\(D:\d{14}.{7}\)/

            Best Regards,

            guy038

            A 1 Reply Last reply Jun 6, 2020, 2:07 PM Reply Quote 2
            • A
              Alan Kilborn @guy038
              last edited by Alan Kilborn Jun 6, 2020, 2:07 PM Jun 6, 2020, 2:07 PM

              @guy038

              Hi Guy,
              Is there a way to get the number of spaces to use for the replacement, from the length of the original match?

              1 Reply Last reply Reply Quote 1
              • G
                guy038
                last edited by guy038 Jun 6, 2020, 10:10 PM Jun 6, 2020, 9:31 PM

                Hi, @paul-smithers, @Alan-kilborn and All,

                Yeaaaah ! Indeed, there is a method ;-))

                I thought about the very basic replacement of each single standard char( . ) with a space char ( \x20 )

                But we need to replace text with spaces, in some zones only, not everywhere ! To achieve such a task, we’ll use a new feature of our regex engine, since Notepad++ v7.7 : the backtracking control verbs ! Why this idea came to my mind ? Well, just because I’m preparing a documentation on these zero-width assertions !

                Fundamentally, the goal is to use this generic regex, below :

                ^What we do NOT want to match(*SKIP)((*F)|what we WANT to match, delimited with a LOOK-AHEAD|Again, what we do NOT want to match(*SKIP)(*F)|Again,what we WANT to match, delimited by an other LOOK-AHEAD|.... and so on

                Alan, could you be patient till I build up and post this documentation about these backtracking control verbs ?

                Meanwhile, you’ll find some hints, here :

                https://www.rexegg.com/backtracking-control-verbs.html#skipfail


                A little practice :

                Assuming the initial and final text, desired by @paul-smithers

                BEFORE : <</C[1.0 0.819611 0.0]/CreationDate(D:20200606114426+02'00')/F 28/M(D:20200606114426+02'00')/NM...
                AFTER  : <</C[1.0 0.819611 0.0]                                      /F 28/                           NM...
                

                We can tell that :

                • First, text, from beginning of line till a ] is unwanted

                • Then, text, till the string /F, is wanted and, for each single char in this zone, we want to replace it with a space char

                • Now, the text /F 28/ is unwanted

                • Finally, text till the string NM is also wanted and again, for each single char in this zone, we want to replace it with a space char


                So, look how easy it is to build up the search regex, from the points above ! In addition, I’ll use the free spacing mode for a better readability

                SEARCH (?x-s) ^.+\] (*SKIP)(*F) | (?=.+/F) . | /F\x2028/ (*SKIP)(*F) | (?=.+NM) .

                REPLACE \x20

                We get :

                
                Text of @paul-smithers :
                
                BEFORE : <</C[1.0 0.819611 0.0]/CreationDate(D:20200606114426+02'00')/F 28/M(D:20200606114426+02'00')/NM...
                AFTER  : <</C[1.0 0.819611 0.0]                                      /F 28/                           NM...
                
                Other TESTS :
                
                BEFORE : [1.0 0.819611 0.0]/CreationDate(D:20200606114426)/F 28/M(D:20200606+02'00')/NM...
                AFTER  : [1.0 0.819611 0.0]                               /F 28/                     NM...
                
                
                BEFORE : [1.0 0.819611]/CreationDate(+02'00')/F 28/M(D:114426+02)/NM...
                AFTER  : [1.0 0.819611]                      /F 28/               NM...
                
                

                Magic, isn’t it ;-))


                Notes :

                • Beware of the final dot, after the two positive look-aheads !

                • Of course, in case of an huge file, problem of performance may occurs, as each single character is replaced with a space !

                • Note, also, that the use of the \K feature would not give the same behavior. Indeed, in that case, the part after \K ( the . ) must come, necessarily, right after \K, because this regex contains 2 alternatives only, unlike the 4 alternatives of the former regex ! Just try it :

                SEARCH (?-s)^.+\]\K(?=.+/F).|/F 28/\K(?=.+NM).

                Cheers,

                guy038

                1 Reply Last reply Reply Quote 3
                • G
                  guy038
                  last edited by Jun 7, 2020, 11:01 AM

                  @paul-smithers, @Alan-kilborn and All,

                  I guess I must have been influenced by my upcoming documentation on Backtracking control verbs !

                  In fact, be reassured, there is still a classical solution, which does not use this new feature. Here it is this second solution, written with the free-spacing mode (?x) :

                  SEARCH (?x-s) (^.+\]) | (?=.+/F) (.) | (/F\x2028/) | (?=.+NM) (.)

                  REPLACE (?1$0)(?2\x20)(?3$0)(?4\x20)

                  As you can see :

                  • Any part, that we do not want to match, is simply rewritten ( $0 )

                  • In zones, that we do care of, each single standard character ( . ) is replaced with a space char ( \x20 )

                  BR

                  guy038

                  1 Reply Last reply Reply Quote 3
                  • P
                    Paul smithers
                    last edited by Jun 21, 2020, 10:16 AM

                    Hello, first of all thanks everyone for the help.

                    I have tried the first proposal (?:/CreationDate|/M)(D:\d{14}+02’00’) from @Alan-kilborn and it works perfectly since i dont need the \x20 space char.

                    For some reason if i use the other proposals, Acrobat refuse to import the modified fdf file because an unspecified error.

                    Anyway, i resolved my problem now. I use this script to remove the autor name https://adobe.ly/3emVRkC and the search RegEx for the timestamp.

                    Thanks again.

                    1 Reply Last reply Reply Quote 0
                    • First post
                      Last post
                    The Community of users of the Notepad++ text editor.
                    Powered by NodeBB | Contributors