Community
    • Login

    Replacing variable length file paths in a GEDCOM file

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    44 Posts 5 Posters 3.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Alan KilbornA
      Alan Kilborn @Web Master
      last edited by Alan Kilborn

      @web-master

      My 30ms reaction to this data: The ~ is a new, never mentioned before feature of your data. Of course the expressions provided before don’t match it. But you already know this.

      Web MasterW 1 Reply Last reply Reply Quote 0
      • Web MasterW
        Web Master @Alan Kilborn
        last edited by

        @alan-kilborn I mentioned in my OP - bullet 2

        Alan KilbornA 1 Reply Last reply Reply Quote 0
        • Alan KilbornA
          Alan Kilborn @Web Master
          last edited by Alan Kilborn

          @web-master said in Replacing variable length file paths in a GEDCOM file:

          I mentioned in my OP - bullet 2

          It’s true; you did:

          The old file path always starts at char 8 in the string but may be: \ / ~ or a drive letter, so position 8 is the most reliable start point

          But everyone missed it. :-)
          Because we are more focused on sample data and what it looks like.
          I guarantee it wouldn’t have been missed if it was provided as part of the original sample data.

          Web MasterW 1 Reply Last reply Reply Quote 1
          • Web MasterW
            Web Master @Alan Kilborn
            last edited by

            @alan-kilborn I shall breakout the sackcloth and ashes. :)

            Alan KilbornA 1 Reply Last reply Reply Quote 1
            • Alan KilbornA
              Alan Kilborn @Web Master
              last edited by

              @web-master

              Possibly the expression you now want (OK, you wanted it from the start!) is:

              (?-s)(?<=^\d FILE )~?/.+(?=/)

              BTW, here’s the English explanation of it, for maximum learning:

              • Use these options for the whole regular expression (?-s)
                • (hyphen inverts the meaning of the letters that follow) -
                • Dot doesn’t match line breaks s
              • Assert that the regex below can be matched ending at this position (positive lookbehind) (?<=^\d FILE )
                • Assert position at the beginning of a line (at beginning of the string or after a line break character) (carriage return and line feed, form feed, next line, line separator, paragraph separator) ^
                • Match a single character that is a “digit” (any decimal number in any Unicode script, plus any symbol with a decimal value in the active code page) \d
                • Match the character string “ FILE ” literally (case sensitive) FILE
              • Match the character “~” literally ~?
                • Between zero and one times, as many times as possible, giving back as needed (greedy) ?
              • Match the character “/” literally /
              • Match any single character that is NOT a line break character (line feed, carriage return, form feed, next line, line separator, paragraph separator) .+
                • Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
              • Assert that the regex below can be matched starting at this position (positive lookahead) (?=/)
                • Match the character “/” literally /

              Created with RegexBuddy

              Neil SchipperN Web MasterW 2 Replies Last reply Reply Quote 4
              • Neil SchipperN
                Neil Schipper @Alan Kilborn
                last edited by

                @alan-kilborn said in Replacing variable length file paths in a GEDCOM file:

                ~?

                New spec says we need to handle both kinds of slash, tilde, and drive letter, so it needs to be more complex than that, right?

                Annoyingly, since the behind text is now varlen, this demolishes the glory of replace text being pure user-text.

                Instead of mucking about with all the variations of drive letters and slashes and tildes, I’m gonna propose a looser spec:

                • following <space after “FILE”>, match any run of text, min len 1, until trailing forw slash

                Hence:

                Find (?-s)(?<=^\d FILE ).+(?=/)
                Replace: /Smith_1234
                So replacement text now needs the starting slash.
                To keep the user experience elegant, this would be incorp’d into the script:

                editor.rereplace(find_regex, '/' + replacement_text)
                
                Alan KilbornA Web MasterW 2 Replies Last reply Reply Quote 3
                • Web MasterW
                  Web Master @Alan Kilborn
                  last edited by Web Master

                  @alan-kilborn Thanks Alan. I think I need to read a regex primer before I have another read of that explanation.

                  But to go back to my OP and second bullet, the ~ was not the only char to put a spanner in the works. It’s possibe it could also be:

                  \
                  Drive letter C: D: etc case insensitive)

                  … and these appear to trip up Neil’s enhanced version (EDIT: Ah, this message crosses with Neil’s latest, where he identifies the same issue)

                  But your orginal query handles them all.

                  I think it’s time to be realistic and say that we have solved it and that the effort to make it completely foolproof is perhaps a stretch farther than we NEED to go

                  Meanwhile I’ll go off and explore Python

                  Thank you everyone. It’s only 24 hours since my initial request for help and you’ve all been fab!

                  1 Reply Last reply Reply Quote 2
                  • Alan KilbornA
                    Alan Kilborn @Neil Schipper
                    last edited by Alan Kilborn

                    @neil-schipper said in Replacing variable length file paths in a GEDCOM file:

                    New spec says we need to handle both kinds of slash, tilde, and drive letter, so it needs to be more complex than that, right?

                    Yea.
                    It’s actually old-spec, though. Maybe let’s call it we-didn’t-read-spec.
                    :-)

                    And TBH, my “exactness” tends to fall off the longer these types of threads go on…

                    However, I think we’ve given the OP enough to get his task carried out, so that’s a good thing.

                    Neil SchipperN 1 Reply Last reply Reply Quote 2
                    • Web MasterW
                      Web Master @Neil Schipper
                      last edited by

                      @neil-schipper said in Replacing variable length file paths in a GEDCOM file:

                      Find (?-s)(?<=^\d FILE ).+(?=/)
                      Replace: /Smith_1234
                      So replacement text now needs the starting slash.

                      That’s the one! And to echo Alan’s last comment, you’ve given me all I asked for and more than enough to be going on with. I’m good to go now, thank you.

                      1 Reply Last reply Reply Quote 4
                      • Neil SchipperN
                        Neil Schipper @Alan Kilborn
                        last edited by

                        @alan-kilborn said in Replacing variable length file paths in a GEDCOM file:

                        It’s actually old-spec, though.

                        True.

                        It’s actually old-spec, though. Maybe let’s call it we-didn’t-read-spec.

                        I’m not so sure about that: Your original expression did pick up the total requirement. My “new and improved” one introduced the non-compliance (and my most recent is in essence the same as your original).

                        Alan KilbornA 1 Reply Last reply Reply Quote 3
                        • Alan KilbornA
                          Alan Kilborn @Neil Schipper
                          last edited by

                          @neil-schipper said in Replacing variable length file paths in a GEDCOM file:

                          Your original expression did pick up the total requirement.

                          I think that might have been blind luck. :-)

                          1 Reply Last reply Reply Quote 2
                          • Paul WormerP
                            Paul Wormer @Web Master
                            last edited by Paul Wormer

                            @web-master said in Replacing variable length file paths in a GEDCOM file:

                            It took long enough on a file with 4K lines but some of the files have 1million lines!!!

                            I am sorry to bud in, but I noticed that your question is about manually typing in replacement strings. Are you certain you want to do this for 1 million lines? Or is the number of lines starting with 1 FILE small enough to do the typing in a finite time span? Just curious.

                            Web MasterW 1 Reply Last reply Reply Quote 0
                            • Web MasterW
                              Web Master @Paul Wormer
                              last edited by

                              @paul-wormer The replacement string is constant so it’s a one-time entry into the expression the guys have devised for me.

                              1 Reply Last reply Reply Quote 0
                              • First post
                                Last post
                              The Community of users of the Notepad++ text editor.
                              Powered by NodeBB | Contributors