Community
    • Login

    Replacing variable length file paths in a GEDCOM file

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    44 Posts 5 Posters 2.7k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Web MasterW
      Web Master
      last edited by

      Fellow Notepad++ Users,

      Could you please help me the the following search-and-replace problem I am having?

      We use a text format file called GEDCOM (file ext = “.ged”) to import family trees to our online service. Because it’s a text file it cannot include media, but it can include links to media. But these links are to the local folders on the user’s computer and we have to convert those to the file path on the server where the images reside.

      Here is the data I currently have (“before” data):

      A typical line will look something like this:
      
      1 FILE /Users/johnssmith/Documents/FamilyHistory/Heredis/TestMedia.hmw/Media/#2/HEDGCOCK James Ernest (1865-1954).jpg
      
      
      • There will be multiple lines in the file and they all commence "1 FILE "

      • The old file path always starts at char 8 in the string but may be: \ / ~ or a drive letter, so position 8 is the most reliable start point

      • The length of the file path can be infinitely variable within a single file but it always ends with a final “/”

      • The objective is to replace everything from and including position 8 to the final “/” with a parameter that is, for example “/Smith_1234/”

      Here is how I would like that data to look (“after” data):

      1 FILE /Smith_1234/HEDGCOCK James Ernest (1865-1954).jpg
      
      

      To accomplish this, I have tried using the following Find/Replace expressions and settings

      Find What = `highlighted occurence`
      Replace With = `/smith_1234/`
      Search Mode = NORMAL
      Dot Matches Newline = NOT CHECKED
      

      Then repeat for next variation of the file path

      It works but it’s tedious because I have to work through the file looking for the different variations. It took long enough on a file with 4K lines but some of the files have 1million lines!!!

      I know little about regex but I have toyed with macros. However, this is completely outside my competence. Could you please help me find the solution?

      What I’d like to do is give one of our volunteers a macro to run on an open file, where they enter the bit between the “/ /” e.g. “smith_1234” amd hey presto!!

      Thank you.

      Paul Barrett
      Berkshire Family History Society

      Alan KilbornA Paul WormerP 3 Replies Last reply Reply Quote 2
      • Alan KilbornA
        Alan Kilborn @Web Master
        last edited by Alan Kilborn

        @web-master said in Replacing variable length file paths in a GEDCOM file:

        Something like this:

        Find: (?-s)^(1 FILE ).+/
        Replace: ${1}/Smith_1234/
        Search mode: Regular expression

        but some of the files have 1million lines!!!

        With that much data, before any important transformation operation, please back up your data!

        Web MasterW 2 Replies Last reply Reply Quote 1
        • Alan KilbornA
          Alan Kilborn @Web Master
          last edited by

          @web-master

          BTW, these settings are meaningless together:

          Search Mode = NORMAL
          Dot Matches Newline = NOT CHECKED

          “Dot Matches Newline” only matters when Search Mode = Regular expression

          1 Reply Last reply Reply Quote 0
          • Web MasterW
            Web Master @Alan Kilborn
            last edited by

            @alan-kilborn To be honest I had no idea of the relevance of the dot line

            Thanks for the speed of response and the suggestion. I will let that rip on a test file later!

            Paul

            Alan KilbornA 1 Reply Last reply Reply Quote 0
            • Alan KilbornA
              Alan Kilborn @Web Master
              last edited by Alan Kilborn

              @web-master said in Replacing variable length file paths in a GEDCOM file:

              no idea of the relevance of the dot line

              Ticking that box is usually not a good idea unless you really know what you are doing.

              In the expression I gave you, the tick state of the box is irrelevant because I lead off my expression with (?-s) which is secret-code for “untick the box”.

              The box goes to what a . in the expression can match. If unticked, it can’t match a line-ending; if ticked, it can. Since .+ means to match “any character as many times as possible”, it really does matter if you want your search to possibly match across many lines or stay on one line. From your description in your problem, you just want it on one line. Hence the use of the unticked box equivalent (?-s)

              PeterJonesP Web MasterW 2 Replies Last reply Reply Quote 2
              • PeterJonesP
                PeterJones @Alan Kilborn
                last edited by

                @alan-kilborn

                Given that the “Dot Matches Newline” info is part of the question template, and ++@Web-Master obviously made use of that template, I am just happy to see evidence that some users are actually able to follow directions. ;-) I’d much rather have users give us that info when it’s not needed then vice versa.

                @Web-Master, so that you know: this puts you in good stead relative to many question-askers here. So kudos. :-)

                Web MasterW 1 Reply Last reply Reply Quote 1
                • Web MasterW
                  Web Master @Alan Kilborn
                  last edited by

                  @alan-kilborn The file could contain thousands of lines that all start "FILE 1 ", does that change things? They will be randomly scattered (well not actually random, but for these purposes they can be considered to be so)

                  Alan KilbornA 1 Reply Last reply Reply Quote 0
                  • Web MasterW
                    Web Master @PeterJones
                    last edited by

                    @peterjones As webmaster for a society where our dempgraphic is 55 - 95, having to deal with questions like “Why doesn’t it work?” and “What do you mean by right-click?”, I frequently have to tell people to ‘follow the friggin’ instructions’ (expletives deleted.)

                    I’m a complete numpty about regex though, so I’m sure I will tax your paitence in other ways!

                    1 Reply Last reply Reply Quote 0
                    • Alan KilbornA
                      Alan Kilborn @Web Master
                      last edited by

                      @web-master said in Replacing variable length file paths in a GEDCOM file:

                      The file could contain thousands of lines that all start "FILE 1 ", does that change things?

                      Nope, that’s what was anticipated from your problem description.

                      I’m sure with your attitude you could become adept at regex; there are resources on the FAQ page that you should check out.

                      1 Reply Last reply Reply Quote 2
                      • Web MasterW
                        Web Master @Alan Kilborn
                        last edited by

                        @alan-kilborn said in Replacing variable length file paths in a GEDCOM file:

                        @web-master said in Replacing variable length file paths in a GEDCOM file:

                        Something like this:

                        Find: (?-s)^(1 FILE ).+/
                        Replace: ${1}/Smith_1234/
                        Search mode: Regular expression

                        but some of the files have 1million lines!!!

                        With that much data, before any important transformation operation, please back up your data!

                        Wow! Got it in one! It worked.

                        Mind you, the file I chose at random to test it on has a variation on the theme because instead of each line starting "1 FILE " they started "2 FILE ". But tweaking the expression fixed that. What would the syntax be for any line starting 0 - 9 then FILE, please?

                        PeterJonesP 1 Reply Last reply Reply Quote 0
                        • PeterJonesP
                          PeterJones @Web Master
                          last edited by PeterJones

                          @web-master said in Replacing variable length file paths in a GEDCOM file:

                          What would the syntax be for any line starting 0 - 9 then FILE, please?

                          \d means match a single digit (0-9); \d+ means match one or more digits (0-99999…)

                          Web MasterW 1 Reply Last reply Reply Quote 1
                          • Web MasterW
                            Web Master @PeterJones
                            last edited by

                            @peterjones said in Replacing variable length file paths in a GEDCOM file:

                            @web-master said in Replacing variable length file paths in a GEDCOM file:

                            What would the syntax be for any line starting 0 - 9 then FILE, please?

                            \d means match a single digit (0-9); \d+ means match one or more digits (0-99999…)

                            OK so if the Find what becomes…

                            (?-s)^\d( FILE ).+/

                            … the replace with gets SNAFU’d and loses the inital number

                            FILE /Smith_1234/Robert Frederic & Audrey Lilian - 3.jpg

                            instead of

                            2 FILE /Smith_1234/Robert Frederic & Audrey Lilian - 3.jpg

                            How do I pass the initial number from the find to the replace please?

                            PeterJonesP 1 Reply Last reply Reply Quote 0
                            • PeterJonesP
                              PeterJones @Web Master
                              last edited by PeterJones

                              @web-master ,

                              Why did you take it out of the parentheses? You had (?-s)^(1 FILE ).+/, and I suggested that you use \d instead of 1, which would have been (?-s)^(\d FILE ).+/ … if the \d is in the parentheses, it will be included in the replacement using the $1 that’s already there.

                              Web MasterW 1 Reply Last reply Reply Quote 2
                              • Web MasterW
                                Web Master @PeterJones
                                last edited by

                                @peterjones I figured that 1 FILE inside the paranetheses was a literal.

                                PeterJonesP 1 Reply Last reply Reply Quote 0
                                • PeterJonesP
                                  PeterJones @Web Master
                                  last edited by PeterJones

                                  @web-master ,

                                  I figured that 1 FILE inside the paranetheses was a literal.

                                  The parentheses doesn’t make it literal. The parentheses makes it a group, so that it will capture the matched text inside the parentheses as a group, which you can then reference during the replacement to get back pieces of what was in your match. By taking the \d outside the parentheses, the digit wasn’t kept as part of the group, and thus wasn’t available to be inserted into the replacement.

                                  • capture groups (see Numbered Capture Group in that section)
                                  • replacement expressions (see $ℕ in that section)
                                  Web MasterW 1 Reply Last reply Reply Quote 1
                                  • Web MasterW
                                    Web Master @PeterJones
                                    last edited by

                                    @peterjones Got it! Thank you

                                    This works beautifully. Is there a way to put this into a macro so that at runtime the user is prompted for the replacement string? That would be the cherry on top of the icing/frosting on top of the cake.

                                    No biggie if it’s too big an ask. I’m sure I can coach the user to do it properly

                                    Paul

                                    PeterJonesP Alan KilbornA 2 Replies Last reply Reply Quote 0
                                    • PeterJonesP
                                      PeterJones @Web Master
                                      last edited by

                                      @web-master ,

                                      The macro language used by Notepad++ has no concept of “prompt”, so the simple answer is “no”.

                                      “Or something”, OTOH: there is a PythonScript plugin, which allows you to write a script that is run inside of Notepad++ which can automate the Notepad++ GUI and editor contents, which could run that search/replace; but you’d have to install the plugin, and learn how to write such a script. It really depends on how much effort you want to go to now, to make it easier for your users to do this in the future. I think Alan’s posted at least one script in the past that pops up a user input box and asks for text, which then gets populated into a regex that is automatically run on the current file… searching the forum for posts by @Alan-Kilborn that contain prompt might get you there eventually (though you’ll probably have to wade through quite a few false hits as well… prompt isn’t that uncommon a word).

                                      Web MasterW 1 Reply Last reply Reply Quote 1
                                      • Web MasterW
                                        Web Master @PeterJones
                                        last edited by

                                        @peterjones Thanks. I’ll try coaching the user first. :)

                                        I am so impresssed with the speed and quality of response to my post. Thank you so much Alan and Peter

                                        Neil SchipperN 1 Reply Last reply Reply Quote 2
                                        • Neil SchipperN
                                          Neil Schipper @Web Master
                                          last edited by Neil Schipper

                                          @web-master Hi. Here’s a variation on Alan’s solution which has the feature that users enter the replacement text as is (so they won’t have to be so careful about entering text into a field containing cryptic regex codes) : (?<=^\d FILE /).*(?=/)

                                          (Edit: slightly simplified)

                                          Web MasterW 1 Reply Last reply Reply Quote 2
                                          • Web MasterW
                                            Web Master @Neil Schipper
                                            last edited by

                                            @neil-schipper said in Replacing variable length file paths in a GEDCOM file:

                                            (?<=^\d FILE /).*(?=/)

                                            Forgive me Neil, but I’m not clear how to use this modified example?

                                            Neil SchipperN 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            The Community of users of the Notepad++ text editor.
                                            Powered by NodeBB | Contributors