• Login
Community
  • Login

Replacing variable length file paths in a GEDCOM file

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
44 Posts 5 Posters 3.1k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • W
    Web Master @Neil Schipper
    last edited by Mar 4, 2022, 10:44 AM

    @neil-schipper said in Replacing variable length file paths in a GEDCOM file:

    (?<=^\d FILE /).*(?=/)

    Forgive me Neil, but I’m not clear how to use this modified example?

    N 1 Reply Last reply Mar 4, 2022, 11:06 AM Reply Quote 0
    • N
      Neil Schipper @Web Master
      last edited by Mar 4, 2022, 11:06 AM

      @web-master I could have been more clear.

      The idea is that you perform a Regex search in the same way as the earlier recipe, except the Find expression is the one I offered, and, the Replace expression is simply and completely the text you want to substitute in.

      Thus, instead of your people having to carefully modify the Replace expression from ${1}/Smith_1234/ to ${1}/Jones_4567/ as they proceed from one task to the next, they can more comfortably change Smith_1234 to Jones_4567 – no funny confusing text in the Replace text box.

      Also, if you want the solution to be robust against users inadvertently checking that option box to the right of Reg Exp, prefix the Find expression with (?-s) as per Alan’s earlier expression.

      Also, if you want to make it impossible to match a completely empty path (nothing between two slashes: //), in my Find expression, we’d replace * with +, hence: (?-s)(?<=^\d FILE /).+(?=/)

      W A 2 Replies Last reply Mar 4, 2022, 12:05 PM Reply Quote 1
      • W
        Web Master @Neil Schipper
        last edited by Mar 4, 2022, 12:05 PM

        @neil-schipper Hi Neil. OK, that’s what I thought you meant, but when I do that, Find and Replace does nothing

        N 1 Reply Last reply Mar 4, 2022, 12:17 PM Reply Quote 0
        • A
          Alan Kilborn @Neil Schipper
          last edited by Mar 4, 2022, 12:16 PM

          @neil-schipper said in Replacing variable length file paths in a GEDCOM file:

          Also, if you want to make it impossible to match a completely empty path (nothing between two slashes: //), in my Find expression, we’d replace * with +

          I’m surprised you changed (my) original usage of .+ to .* in your version, then backed it out with “we’d replace * with +”. :-)

          N 1 Reply Last reply Mar 4, 2022, 12:33 PM Reply Quote 0
          • N
            Neil Schipper @Web Master
            last edited by Mar 4, 2022, 12:17 PM

            @web-master

            Just to be clear:
            Find: (?-s)(?<=^\d FILE /).+(?=/)
            Replace: Smith_1234
            Mode=regex; checkbox to the right doesn’t matter anymore

            Then, Find Next to observe expected matching, Replace to perform the substitution on the currently selected match, Replace All to perform the substitution on entire file.

            I have tested this. Are you sure you aren’t acting on a file that has already had the substitutions done?

            W 1 Reply Last reply Mar 4, 2022, 1:58 PM Reply Quote 0
            • N
              Neil Schipper @Alan Kilborn
              last edited by Mar 4, 2022, 12:33 PM

              @alan-kilborn said:

              you changed (my) original usage of .+ to .*

              That’s not how it actually went. I devised my solution independently from reading the OP (deciding early that it should have a look-behind and a look-ahead). Then I saw you had a solution that was satisfactory, and I saw no reason to pipe in.

              Much later I noticed in the follow-up convo that @web-master was trying to reduce complexity for his non-technical volunteers, and realized my solution was favorable to that. And it was after that that I realized our solutions also differed in the modifier, and that the asterisk was maybe a bit too loose (although, dollars to doughnuts, they perform equally).

              1 Reply Last reply Reply Quote 1
              • A
                Alan Kilborn @Web Master
                last edited by Alan Kilborn Mar 4, 2022, 12:46 PM Mar 4, 2022, 12:44 PM

                @web-master said in Replacing variable length file paths in a GEDCOM file:

                Is there a way to put this into a macro so that at runtime the user is prompted for the replacement string?

                As @PeterJones mentioned, not possible with a macro.

                But as a PythonScript it is rather simple:

                replacement_text = notepad.prompt('Replacement text:', '', '')
                if replacement_text:
                    find_regex = r'(?-s)(?<=^\d FILE /).+(?=/)'
                    editor.rereplace(find_regex, replacement_text)
                

                You’d run the script and then get prompted with:

                a7afa474-b454-42b6-82e0-47d305e365b2-image.png

                After pressing OK the replacements would be made.

                1 Reply Last reply Reply Quote 2
                • W
                  Web Master @Neil Schipper
                  last edited by Mar 4, 2022, 1:58 PM

                  @neil-schipper Yes, I copied and pasted the expression direct from here and the file had def not been processed already.

                  I don’t know what the protocol is here. May I PM you the file so you can try it?

                  A N 2 Replies Last reply Mar 4, 2022, 2:11 PM Reply Quote 0
                  • A
                    Alan Kilborn @Web Master
                    last edited by Mar 4, 2022, 2:11 PM

                    @web-master

                    Why don’t you just post a line – or a few lines – here that you think it should match, but it isn’t? That may be more instructive (for those reading along) than taking it offline into some private discussion.

                    While initially impressed with your initiative, I’m now losing faith in your debugging skills. :-(

                    W 1 Reply Last reply Mar 4, 2022, 2:27 PM Reply Quote 0
                    • N
                      Neil Schipper @Web Master
                      last edited by Mar 4, 2022, 2:17 PM

                      @web-master Does PM mean the chat? Sure go ahead, but I’m not sure if it supports file transfer. Another option is to put file on a hosting site so we can all access it.

                      An easy thing to do is to gather up at least a few screens worth of “interesting” data (stressful/challenging to the algorithm) and tossing it into a “literal text” box as you’ve already learned to do (I’m not sure of upper size limit). (Here’s a trick: to defeat inappropriate colorization, add txt immediately after the opening line’s triple-backquotes).

                      But over and above all of this, do notice that Alan’s post above provides a solution that is everything you dreamed of. You’ll need to install the Pythonscript plugin and learn a bit about making / saving / running a script (I’ve seen step-by-step recipes; I expect someone will link you to a reliable up-to-date one).

                      Have fun.

                      1 Reply Last reply Reply Quote 0
                      • W
                        Web Master @Alan Kilborn
                        last edited by Mar 4, 2022, 2:27 PM

                        @alan-kilborn Hi Alan

                        I have to be circumspect about what I put in a public domain because a GEDCOM file contaions personal; info of living people so there’s a GDPR issue. But, sure I can cut out some of the lines that are giving trouble and post them here. One moment…

                        W 1 Reply Last reply Mar 4, 2022, 2:34 PM Reply Quote 0
                        • W
                          Web Master @Web Master
                          last edited by Mar 4, 2022, 2:34 PM

                          @web-master OK here are a few lines that are not following Neil’s rule. Note NONE of the lines in the 4,000 line file were changed. It said there were zero occurrences. I copied and pasted the query so it shouldn’t be operator error, but PICNIC problems can’t be ruled out! :)

                          2 FILE ~/Pictures/Reunion Pictures/Imported Media/HOBDAY, Audrey Lilian.jpg
                          2 FILE ~/Pictures/Reunion Pictures/Imported Media/HOBDAY, Audrey Lilian - 1.jpg
                          2 FILE ~/Pictures/Reunion Pictures/Imported Media/HOBDAY, Audrey Lilian - 2.jpg
                          2 FILE ~/Pictures/Reunion Pictures/Imported Media/Robert Frederic & Audrey Lilian - 3.jpg
                          2 FILE ~/Pictures/CUMBERLAND BMD/Audrey Cumberland Death Certificate.jpeg
                          2 FILE ~/Pictures/HOBDAY/Mums School Report.jpg
                          2 FILE ~/Pictures/CUMBERLAND Parish Registers/Frederick Charles Cumberland Baptism 1901.jpg
                          2 FILE ~/Pictures/CUMBERLAND BMD/Frederick Cumberland Death 1985.jpeg
                          2 FILE ~/Pictures/CUMBERLAND DAVIS PHOTOS/Cumberland L0005.jpg
                          2 FILE ~/Pictures/CUMBERLAND DAVIS PHOTOS/Cumberland L0332.jpg
                          2 FILE ~/Pictures/CUMBERLAND DAVIS PHOTOS/Cumberland L0370.jpg
                          2 FILE ~/Pictures/CUMBERLAND DAVIS PHOTOS/Cumberland L0003.jpg
                          2 FILE ~/Pictures/CUMBERLAND DAVIS PHOTOS/Cumberland L0150.jpg
                          2 FILE ~/Pictures/CUMBERLAND DAVIS PHOTOS/Cumberland L0014.jpg
                          2 FILE ~/Pictures/CUMBERLAND DAVIS PHOTOS/Cumberland L0080.jpg
                          2 FILE ~/Pictures/CUMBERLAND DAVIS PHOTOS/Cumberland L0141.jpg
                          2 FILE ~/Pictures/CUMBERLAND DAVIS PHOTOS/Cumberland L0172.jpg
                          2 FILE ~/Pictures/CUMBERLAND DAVIS PHOTOS/Cumberland L0254 (2).jpg
                          2 FILE ~/Pictures/CUMBERLAND DAVIS PHOTOS/Dance Programme 1.jpg
                          2 FILE ~/Pictures/CUMBERLAND DAVIS PHOTOS/Dance Programme 2.jpg
                          2 FILE ~/Pictures/CUMBERLAND DAVIS PHOTOS/Dance Programme 3.jpg
                          2 FILE ~/Pictures/CUMBERLAND BMD/Frederick Cumberland Cremation 1985.jpg
                          2 FILE ~/Pictures/CUMBERLAND DAVIS PHOTOS/Dance Programme 4.jpg
                          2 FILE ~/Pictures/CUMBERLAND DAVIS PHOTOS/DSCN0101.jpg
                          2 FILE ~/Pictures/CUMBERLAND DAVIS PHOTOS/DSCN0102.jpg
                          2 FILE ~/Pictures/CUMBERLAND CENSUS/Frederick Cumberland 1921 Census.jpg
                          2 FILE ~/Pictures/CUMBERLAND CENSUS/Frederick Cumberland 1921 Census (1).jpg
                          2 FILE ~/Pictures/CUMBERLAND DAVIS PHOTOS/Cumberland L0108.jpg
                          
                          A 1 Reply Last reply Mar 4, 2022, 2:36 PM Reply Quote 0
                          • A
                            Alan Kilborn @Web Master
                            last edited by Alan Kilborn Mar 4, 2022, 2:37 PM Mar 4, 2022, 2:36 PM

                            @web-master

                            My 30ms reaction to this data: The ~ is a new, never mentioned before feature of your data. Of course the expressions provided before don’t match it. But you already know this.

                            W 1 Reply Last reply Mar 4, 2022, 2:37 PM Reply Quote 0
                            • W
                              Web Master @Alan Kilborn
                              last edited by Mar 4, 2022, 2:37 PM

                              @alan-kilborn I mentioned in my OP - bullet 2

                              A 1 Reply Last reply Mar 4, 2022, 2:39 PM Reply Quote 0
                              • A
                                Alan Kilborn @Web Master
                                last edited by Alan Kilborn Mar 4, 2022, 2:39 PM Mar 4, 2022, 2:39 PM

                                @web-master said in Replacing variable length file paths in a GEDCOM file:

                                I mentioned in my OP - bullet 2

                                It’s true; you did:

                                The old file path always starts at char 8 in the string but may be: \ / ~ or a drive letter, so position 8 is the most reliable start point

                                But everyone missed it. :-)
                                Because we are more focused on sample data and what it looks like.
                                I guarantee it wouldn’t have been missed if it was provided as part of the original sample data.

                                W 1 Reply Last reply Mar 4, 2022, 2:41 PM Reply Quote 1
                                • W
                                  Web Master @Alan Kilborn
                                  last edited by Mar 4, 2022, 2:41 PM

                                  @alan-kilborn I shall breakout the sackcloth and ashes. :)

                                  A 1 Reply Last reply Mar 4, 2022, 2:49 PM Reply Quote 1
                                  • A
                                    Alan Kilborn @Web Master
                                    last edited by Mar 4, 2022, 2:49 PM

                                    @web-master

                                    Possibly the expression you now want (OK, you wanted it from the start!) is:

                                    (?-s)(?<=^\d FILE )~?/.+(?=/)

                                    BTW, here’s the English explanation of it, for maximum learning:

                                    • Use these options for the whole regular expression (?-s)
                                      • (hyphen inverts the meaning of the letters that follow) -
                                      • Dot doesn’t match line breaks s
                                    • Assert that the regex below can be matched ending at this position (positive lookbehind) (?<=^\d FILE )
                                      • Assert position at the beginning of a line (at beginning of the string or after a line break character) (carriage return and line feed, form feed, next line, line separator, paragraph separator) ^
                                      • Match a single character that is a “digit” (any decimal number in any Unicode script, plus any symbol with a decimal value in the active code page) \d
                                      • Match the character string “ FILE ” literally (case sensitive) FILE
                                    • Match the character “~” literally ~?
                                      • Between zero and one times, as many times as possible, giving back as needed (greedy) ?
                                    • Match the character “/” literally /
                                    • Match any single character that is NOT a line break character (line feed, carriage return, form feed, next line, line separator, paragraph separator) .+
                                      • Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
                                    • Assert that the regex below can be matched starting at this position (positive lookahead) (?=/)
                                      • Match the character “/” literally /

                                    Created with RegexBuddy

                                    N W 2 Replies Last reply Mar 4, 2022, 3:35 PM Reply Quote 4
                                    • N
                                      Neil Schipper @Alan Kilborn
                                      last edited by Mar 4, 2022, 3:35 PM

                                      @alan-kilborn said in Replacing variable length file paths in a GEDCOM file:

                                      ~?

                                      New spec says we need to handle both kinds of slash, tilde, and drive letter, so it needs to be more complex than that, right?

                                      Annoyingly, since the behind text is now varlen, this demolishes the glory of replace text being pure user-text.

                                      Instead of mucking about with all the variations of drive letters and slashes and tildes, I’m gonna propose a looser spec:

                                      • following <space after “FILE”>, match any run of text, min len 1, until trailing forw slash

                                      Hence:

                                      Find (?-s)(?<=^\d FILE ).+(?=/)
                                      Replace: /Smith_1234
                                      So replacement text now needs the starting slash.
                                      To keep the user experience elegant, this would be incorp’d into the script:

                                      editor.rereplace(find_regex, '/' + replacement_text)
                                      
                                      A W 2 Replies Last reply Mar 4, 2022, 3:47 PM Reply Quote 3
                                      • W
                                        Web Master @Alan Kilborn
                                        last edited by Web Master Mar 4, 2022, 3:46 PM Mar 4, 2022, 3:43 PM

                                        @alan-kilborn Thanks Alan. I think I need to read a regex primer before I have another read of that explanation.

                                        But to go back to my OP and second bullet, the ~ was not the only char to put a spanner in the works. It’s possibe it could also be:

                                        \
                                        Drive letter C: D: etc case insensitive)

                                        … and these appear to trip up Neil’s enhanced version (EDIT: Ah, this message crosses with Neil’s latest, where he identifies the same issue)

                                        But your orginal query handles them all.

                                        I think it’s time to be realistic and say that we have solved it and that the effort to make it completely foolproof is perhaps a stretch farther than we NEED to go

                                        Meanwhile I’ll go off and explore Python

                                        Thank you everyone. It’s only 24 hours since my initial request for help and you’ve all been fab!

                                        1 Reply Last reply Reply Quote 2
                                        • A
                                          Alan Kilborn @Neil Schipper
                                          last edited by Alan Kilborn Mar 4, 2022, 3:47 PM Mar 4, 2022, 3:47 PM

                                          @neil-schipper said in Replacing variable length file paths in a GEDCOM file:

                                          New spec says we need to handle both kinds of slash, tilde, and drive letter, so it needs to be more complex than that, right?

                                          Yea.
                                          It’s actually old-spec, though. Maybe let’s call it we-didn’t-read-spec.
                                          :-)

                                          And TBH, my “exactness” tends to fall off the longer these types of threads go on…

                                          However, I think we’ve given the OP enough to get his task carried out, so that’s a good thing.

                                          N 1 Reply Last reply Mar 4, 2022, 4:17 PM Reply Quote 2
                                          29 out of 44
                                          • First post
                                            29/44
                                            Last post
                                          The Community of users of the Notepad++ text editor.
                                          Powered by NodeBB | Contributors