Community
    • Login

    Regex - replace subexpression in a repeated subexpression

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    5 Posts 3 Posters 550 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Mohammad HussainM
      Mohammad Hussain
      last edited by

      Hello,

      Before you start looking into this, kindly note that I found a program that does what I need. So right now I just want to learn, but this isn't urgent/important.

      I don’t know if I’m using the correct terminology here, or if this is possible at all, but here it is:

      I am dealing with converting lists of html files’ full paths to XML, as input for a program. example:

      P:\113603\Topic 8-Part 1 Industrial Relations  Legislation\data\resources\docwrap.htm
      P:\113603\Topic 8-Part 1 Industrial Relations  Legislation\data\resources\swfwrap.htm
      P:\113604\1st e-lecture was conducted via MS Teams.html
      P:\113604\Additional Help to View Lecture Video.html
      P:\113604\Cover Page for Submission of Assignment.html
      P:\113604\E-learning Task 3 Orientation\lms\blank.html
      P:\113604\E-learning Task 3 Orientation\lms\goodbye.html
      P:\113604\E-learning Task 3 Orientation\presentation_content\blank.html
      P:\113604\E-learning Task 4  Industrial Relations (Part 2)\index.htm
      P:\113604\E-learning Task 4  Industrial Relations (Part 2)\SCORM.htm
      P:\113604\E-learning Task 4  Industrial Relations (Part 2)\data\resources\docwrap.htm
      P:\113604\E-learning Task 4  Industrial Relations (Part 2)\data\resources\swfwrap.htm
      P:\113604\Team Management Part 2\SCORM.htm
      P:\113605\1. Introduction to Organisational Behaviour.html
      P:\113605\Accessing Newspaper Articles.html
      
      

      The expected final output is (includes indentation, which is not a problem for me with regex or with XML tools)

      				<folder name="113603">
      					<folder name="Topic 8-Part 1 Industrial Relations  Legislation">
      						<folder name="data">
      							<folder name="resources">
      								<folder name="docwrap.htm" mark="marked"/>
      								<folder name="swfwrap.htm" mark="marked"/>
      							</folder>
      						</folder>
      					</folder>
      				</folder>
      				<folder name="113604">
      					<folder name="1st e-lecture was conducted via MS Teams.html" mark="marked"/>
      					<folder name="Additional Help to View Lecture Video.html" mark="marked"/>
      					<folder name="Cover Page for Submission of Assignment.html" mark="marked"/>
      					<folder name="E-learning Task 3 Orientation">
      						<folder name="lms">
      							<folder name="blank.html" mark="marked"/>
      							<folder name="goodbye.html" mark="marked"/>
      						</folder>
      						<folder name="presentation_content">
      							<folder name="blank.html" mark="marked"/>
      						</folder>
      					</folder>
      					<folder name="E-learning Task 4  Industrial Relations (Part 2)">
      						<folder name="data">
      							<folder name="resources">
      								<folder name="docwrap.htm" mark="marked"/>
      								<folder name="swfwrap.htm" mark="marked"/>
      							</folder>
      						</folder>
      						<folder name="index.htm" mark="marked"/>
      						<folder name="SCORM.htm" mark="marked"/>
      					</folder>
      					<folder name="Team Management Part 2">
      						<folder name="SCORM.htm" mark="marked"/>
      					</folder>
      				</folder>
      				<folder name="113605">
      					<folder name="1. Introduction to Organisational Behaviour.html" mark="marked"/>
      					<folder name="Accessing Newspaper Articles.html" mark="marked"/>
      				</folder>
      

      From the input list of full paths, using the following steps:
      1. Search (simple):
      P:
      Replace:
      Nothing/Empty/null
      2. Search (regex):
      (\\)([^\\]{1,}\.htm[l]{0,1})
      Replace:
      <file name="\2" mark="marked"/>
      3. Search (regex):
      (\\)([^\\\r\n\<]{1,})(<.*>)
      Replace (keep replacing till no more search matches are found):
      <folder name="\2">\3</folder>

      I managed to convert the list of paths to XML, looking like this:

      <folder name="113603"><folder name="Topic 8-Part 1 Industrial Relations  Legislation"><folder name="data"><folder name="resources"><file name="docwrap.htm" mark="marked"/></folder></folder></folder></folder>
      <folder name="113603"><folder name="Topic 8-Part 1 Industrial Relations  Legislation"><folder name="data"><folder name="resources"><file name="swfwrap.htm" mark="marked"/></folder></folder></folder></folder>
      <folder name="113604"><file name="1st e-lecture was conducted via MS Teams.html" mark="marked"/></folder>
      <folder name="113604"><file name="Additional Help to View Lecture Video.html" mark="marked"/></folder>
      <folder name="113604"><file name="Cover Page for Submission of Assignment.html" mark="marked"/></folder>
      <folder name="113604"><folder name="E-learning Task 3 Orientation"><folder name="lms"><file name="blank.html" mark="marked"/></folder></folder></folder>
      <folder name="113604"><folder name="E-learning Task 3 Orientation"><folder name="lms"><file name="goodbye.html" mark="marked"/></folder></folder></folder>
      <folder name="113604"><folder name="E-learning Task 3 Orientation"><folder name="presentation_content"><file name="blank.html" mark="marked"/></folder></folder></folder>
      <folder name="113604"><folder name="E-learning Task 4  Industrial Relations (Part 2)"><file name="index.htm" mark="marked"/></folder></folder>
      <folder name="113604"><folder name="E-learning Task 4  Industrial Relations (Part 2)"><file name="SCORM.htm" mark="marked"/></folder></folder>
      <folder name="113604"><folder name="E-learning Task 4  Industrial Relations (Part 2)"><folder name="data"><folder name="resources"><file name="docwrap.htm" mark="marked"/></folder></folder></folder></folder>
      <folder name="113604"><folder name="E-learning Task 4  Industrial Relations (Part 2)"><folder name="data"><folder name="resources"><file name="swfwrap.htm" mark="marked"/></folder></folder></folder></folder>
      <folder name="113604"><folder name="Team Management Part 2"><file name="SCORM.htm" mark="marked"/></folder></folder>
      <folder name="113605"><file name="1. Introduction to Organisational Behaviour.html" mark="marked"/></folder>
      <folder name="113605"><file name="Accessing Newspaper Articles.html" mark="marked"/></folder>
      
      

      However, the tags for common parents are repeated. I want to eliminate that (not sure what this is called, minimize? grouping? something else?)

      I tried to create a regex search and replace for the highest level of folders/parents, with the intent of running the replace part multiple times for subfolders, until no more matches are found, removing all redundant tags:

      Search (regex, this part works):
      (^<folder name="[^"]{1,}">)(.*?)(</folder>\R)(((\1)(.*?)(\3))*)?(\1)
      Replace (doesn’t fully work. just showing you my progress and where I’m stuck):
      \1\2\r\n\4

      And the search captures what I’m looking for, and the replace allows me to remove the last occurrence of</folder> in the first matching line, and remove the parent folder tag in the last matching line. But how do I remove the redundant tags at the beginning and at the end for the lines in between?

      Right now, all the lines in-between are captured in group/subexpression \4. That way I can keep them when I do a replace. But to actually modify them in the same action, I need to remove groups 6 and 8 while keeping group 7, for each line, but my replace only works for the last line of multiple lines, instead of doing this for all the lines in-between.

      Is it even possible to make this work for all lines in-between, targeting individual groups (group 7) inside a repeated group/subexpression (group 4)? if yes, how?

      Thank you :)

      Neil SchipperN 1 Reply Last reply Reply Quote 0
      • Neil SchipperN
        Neil Schipper @Mohammad Hussain
        last edited by

        @mohammad-hussain

        Hi. I don’t think you can capture a not-known-in-advance number of lines in such a way that you can write them back with substitutions.

        So an idiom that would help you go forward is to have a phase that captures groups of lines matching some left side text, and writing them back as a record with desired record start and record end lines, thus:

        Fi: ((^<[^>]+>)(.*\R)((\2)((.*\R)))*)
        Re: \2\r\n\1</folder>\r\n

        (Note: above doesn’t handle case where final text line doesn’t terminate with a newline, so add it manually first.)

        and then a phase to clean out that left side text, and, the trailing tag:

        Fi: ^<[^>]+>(.*?)</folder>\R
        Re: \1\r\n
        or maybe insert tabs (suggested by your sample output):
        Re: \t\t\t\t\1\r\n
        or maybe some number of spaces, as you prefer.

        Here’s output using 4 tabs:

        <folder name="113603">
        				<folder name="Topic 8-Part 1 Industrial Relations  Legislation"><folder name="data"><folder name="resources"><file name="docwrap.htm" mark="marked"/></folder></folder></folder>
        				<folder name="Topic 8-Part 1 Industrial Relations  Legislation"><folder name="data"><folder name="resources"><file name="swfwrap.htm" mark="marked"/></folder></folder></folder>
        </folder>
        <folder name="113604">
        				<file name="1st e-lecture was conducted via MS Teams.html" mark="marked"/>
        				<file name="Additional Help to View Lecture Video.html" mark="marked"/>
        				<file name="Cover Page for Submission of Assignment.html" mark="marked"/>
        				<folder name="E-learning Task 3 Orientation"><folder name="lms"><file name="blank.html" mark="marked"/></folder></folder>
        				<folder name="E-learning Task 3 Orientation"><folder name="lms"><file name="goodbye.html" mark="marked"/></folder></folder>
        				<folder name="E-learning Task 3 Orientation"><folder name="presentation_content"><file name="blank.html" mark="marked"/></folder></folder>
        				<folder name="E-learning Task 4  Industrial Relations (Part 2)"><file name="index.htm" mark="marked"/></folder>
        				<folder name="E-learning Task 4  Industrial Relations (Part 2)"><file name="SCORM.htm" mark="marked"/></folder>
        				<folder name="E-learning Task 4  Industrial Relations (Part 2)"><folder name="data"><folder name="resources"><file name="docwrap.htm" mark="marked"/></folder></folder></folder>
        				<folder name="E-learning Task 4  Industrial Relations (Part 2)"><folder name="data"><folder name="resources"><file name="swfwrap.htm" mark="marked"/></folder></folder></folder>
        				<folder name="Team Management Part 2"><file name="SCORM.htm" mark="marked"/></folder>
        </folder>
        <folder name="113605">
        				<file name="1. Introduction to Organisational Behaviour.html" mark="marked"/>
        				<file name="Accessing Newspaper Articles.html" mark="marked"/>
        </folder>
        
        

        You can now do something similar at the next level of nesting, matching your preferred whitespace after start-of-line.

        You also could take an approach like this on your original data, doing the factoring and maybe indenting early, and adding in the xml eye candy at a later phase.

        I won’t be surprised if there are more clever approaches, but I think this gets you unstuck.

        Mohammad HussainM 1 Reply Last reply Reply Quote 2
        • Mohammad HussainM
          Mohammad Hussain @Neil Schipper
          last edited by

          @neil-schipper That is a smart approach! I guess I was trying to solve it in one go, when I could have done the same (or more) in more steps. Adding/repeating in order to isolate the parts to target next is a very good idea! Thank you :)

          I was kind of hoping you can reference groups in repeating groups though. But at least there are other solutions.

          Thank you again :)

          1 Reply Last reply Reply Quote 1
          • guy038G
            guy038
            last edited by guy038

            Hello, @mohammad-hussain, @neil-schipper and All,

            Before all, @mohammad-hussain, I think that the expected final OUTPUT, shown in your post, is a bit erroneous ! Indeed, your refer of files as, for instance :

            <folder name="docwrap.htm" mark="marked"/>
            

            To my mind, this should be labeled :

            <file name="docwrap.htm" mark="marked"/>
            

            as mentioned in your own regex S/R 2 !


            Now, I found out a way, with macros, to solve your problem ;-)) Briefly, for instance, from this INPUT text, below :

            P:\113603\Topic 8-Part 1 Industrial Relations  Legislation\data\resources\docwrap.htm
            
            
            P:\113603\Topic 8-Part 1 Industrial Relations  Legislation\data\resources\swfwrap.htm
            P:\113604\1st e-lecture was conducted via MS Teams.html
            
            
            P:\113604\Additional Help to View Lecture Video.html
            P:\113604\Cover Page for Submission of Assignment.html
            P:\113604\E-learning Task 3 Orientation\lms\blank.html
            P:\113604\E-learning Task 3 Orientation\lms\goodbye.html
            
            
            
            P:\113604\E-learning Task 3 Orientation\presentation_content\blank.html
            P:\113604\E-learning Task 4  Industrial Relations (Part 2)\index.htm
            
            P:\113604\E-learning Task 4  Industrial Relations (Part 2)\SCORM.htm
            P:\113604\E-learning Task 4  Industrial Relations (Part 2)\data\resources\docwrap.htm
            P:\113604\E-learning Task 4  Industrial Relations (Part 2)\data\resources\swfwrap.htm
            P:\113604\Team Management Part 2\SCORM.htm
            P:\113605\1. Introduction to Organisational Behaviour.html
            
            
            
            P:\113605\Accessing Newspaper Articles.html
            

            You would get this OUTPUT text :

            <folder name="113603">
            <folder name="Topic 8-Part 1 Industrial Relations  Legislation">
            <folder name="data">
            <folder name="resources">
            <file name="docwrap.htm" mark="marked"/>
            <file name="swfwrap.htm" mark="marked"/>
            </folder>
            </folder>
            </folder>
            </folder>
            <folder name="113604">
            <file name="1st e-lecture was conducted via MS Teams.html" mark="marked"/>
            <file name="Additional Help to View Lecture Video.html" mark="marked"/>
            <file name="Cover Page for Submission of Assignment.html" mark="marked"/>
            <folder name="E-learning Task 3 Orientation">
            <folder name="lms">
            <file name="blank.html" mark="marked"/>
            <file name="goodbye.html" mark="marked"/>
            </folder>
            <folder name="presentation_content">
            <file name="blank.html" mark="marked"/>
            </folder>
            </folder>
            <folder name="E-learning Task 4  Industrial Relations (Part 2)">
            <file name="index.htm" mark="marked"/>
            <file name="SCORM.htm" mark="marked"/>
            <folder name="data">
            <folder name="resources">
            <file name="docwrap.htm" mark="marked"/>
            <file name="swfwrap.htm" mark="marked"/>
            </folder>
            </folder>
            </folder>
            <folder name="Team Management Part 2">
            <file name="SCORM.htm" mark="marked"/>
            </folder>
            </folder>
            <folder name="113605">
            <file name="1. Introduction to Organisational Behaviour.html" mark="marked"/>
            <file name="Accessing Newspaper Articles.html" mark="marked"/>
            </folder>
            

            Note there will be a small difference between this OUTPUT and your expected output, as some lines, in the middle, are interchanged !

            With my solution, I have this partial layout :

            <folder name="E-learning Task 4  Industrial Relations (Part 2)">
            <file name="index.htm" mark="marked"/>
            <file name="SCORM.htm" mark="marked"/>
            <folder name="data">
            <folder name="resources">
            <file name="docwrap.htm" mark="marked"/>
            <file name="swfwrap.htm" mark="marked"/>
            </folder>
            </folder>
            </folder>
            

            In your expected final output, you have this partial layout :

            					<folder name="E-learning Task 4  Industrial Relations (Part 2)">
            						<folder name="data">
            							<folder name="resources">
            								<file name="docwrap.htm" mark="marked"/>
            								<file name="swfwrap.htm" mark="marked"/>
            							</folder>
            						</folder>
            						<file name="index.htm" mark="marked"/>
            						<file name="SCORM.htm" mark="marked"/>
            					</folder>
            

            However, these two syntaxes seem quite equivalent !


            Notes :

            • In this second part of my post, you don’t have to remember all the described regex S/R’s. Indeed, in a next post, I’ll give you the exact text to insert, in the <Macros> mode of your active shortcuts.xml file to automate the whole process ! However, in order to understand the principle used :

            • Open the Replace dialog ( Ctrl + H )

            • Then, for each regex S/R to run :

              • Fill in the search and replace regexes

              • Tick the Wrap around option

              • Select the Regular expression search mode

              • Click on the Replace All button

            Well, let’s go. So, given this initial text :

            P:\113603\Topic 8-Part 1 Industrial Relations  Legislation\data\resources\docwrap.htm
            
            
            P:\113603\Topic 8-Part 1 Industrial Relations  Legislation\data\resources\swfwrap.htm
            P:\113604\1st e-lecture was conducted via MS Teams.html
            
            
            P:\113604\Additional Help to View Lecture Video.html
            P:\113604\Cover Page for Submission of Assignment.html
            P:\113604\E-learning Task 3 Orientation\lms\blank.html
            P:\113604\E-learning Task 3 Orientation\lms\goodbye.html
            
            
            
            P:\113604\E-learning Task 3 Orientation\presentation_content\blank.html
            P:\113604\E-learning Task 4  Industrial Relations (Part 2)\index.htm
            
            P:\113604\E-learning Task 4  Industrial Relations (Part 2)\SCORM.htm
            P:\113604\E-learning Task 4  Industrial Relations (Part 2)\data\resources\docwrap.htm
            P:\113604\E-learning Task 4  Industrial Relations (Part 2)\data\resources\swfwrap.htm
            P:\113604\Team Management Part 2\SCORM.htm
            P:\113605\1. Introduction to Organisational Behaviour.html
            
            
            
            P:\113605\Accessing Newspaper Articles.html
            

            The regex S/R A :

            SEARCH (?-i)(?<=\\)[^\\]+\.html?|(^P:\\|^\h*\R)

            REPLACE (?1:<file name="$0" mark="marked"/>)

            gives :

            113603\Topic 8-Part 1 Industrial Relations  Legislation\data\resources\<file name="docwrap.htm" mark="marked"/>
            113603\Topic 8-Part 1 Industrial Relations  Legislation\data\resources\<file name="swfwrap.htm" mark="marked"/>
            113604\<file name="1st e-lecture was conducted via MS Teams.html" mark="marked"/>
            113604\<file name="Additional Help to View Lecture Video.html" mark="marked"/>
            113604\<file name="Cover Page for Submission of Assignment.html" mark="marked"/>
            113604\E-learning Task 3 Orientation\lms\<file name="blank.html" mark="marked"/>
            113604\E-learning Task 3 Orientation\lms\<file name="goodbye.html" mark="marked"/>
            113604\E-learning Task 3 Orientation\presentation_content\<file name="blank.html" mark="marked"/>
            113604\E-learning Task 4  Industrial Relations (Part 2)\<file name="index.htm" mark="marked"/>
            113604\E-learning Task 4  Industrial Relations (Part 2)\<file name="SCORM.htm" mark="marked"/>
            113604\E-learning Task 4  Industrial Relations (Part 2)\data\resources\<file name="docwrap.htm" mark="marked"/>
            113604\E-learning Task 4  Industrial Relations (Part 2)\data\resources\<file name="swfwrap.htm" mark="marked"/>
            113604\Team Management Part 2\<file name="SCORM.htm" mark="marked"/>
            113605\<file name="1. Introduction to Organisational Behaviour.html" mark="marked"/>
            113605\<file name="Accessing Newspaper Articles.html" mark="marked"/>
            

            Now, run the two next regex S/R B

            SEARCH ^(?-is)(.+?)\\.+\R(?:(?:(?!</).)+\R)*\1\\.+\R|^(.+?)\\<file name.+\R
            REPLACE <folder name="\1\2">\r\n$0</folder>\r\n

            then :

            SEARCH ^(?!<).+?\\(.+)
            REPLACE \1

            Which should give :

            <folder name="113603">
            Topic 8-Part 1 Industrial Relations  Legislation\data\resources\<file name="docwrap.htm" mark="marked"/>
            Topic 8-Part 1 Industrial Relations  Legislation\data\resources\<file name="swfwrap.htm" mark="marked"/>
            </folder>
            <folder name="113604">
            <file name="1st e-lecture was conducted via MS Teams.html" mark="marked"/>
            <file name="Additional Help to View Lecture Video.html" mark="marked"/>
            <file name="Cover Page for Submission of Assignment.html" mark="marked"/>
            E-learning Task 3 Orientation\lms\<file name="blank.html" mark="marked"/>
            E-learning Task 3 Orientation\lms\<file name="goodbye.html" mark="marked"/>
            E-learning Task 3 Orientation\presentation_content\<file name="blank.html" mark="marked"/>
            E-learning Task 4  Industrial Relations (Part 2)\<file name="index.htm" mark="marked"/>
            E-learning Task 4  Industrial Relations (Part 2)\<file name="SCORM.htm" mark="marked"/>
            E-learning Task 4  Industrial Relations (Part 2)\data\resources\<file name="docwrap.htm" mark="marked"/>
            E-learning Task 4  Industrial Relations (Part 2)\data\resources\<file name="swfwrap.htm" mark="marked"/>
            Team Management Part 2\<file name="SCORM.htm" mark="marked"/>
            </folder>
            <folder name="113605">
            <file name="1. Introduction to Organisational Behaviour.html" mark="marked"/>
            <file name="Accessing Newspaper Articles.html" mark="marked"/>
            </folder>
            

            Then, again, run these two regex S/R B, successively, in order to get :

            <folder name="113603">
            <folder name="Topic 8-Part 1 Industrial Relations  Legislation">
            data\resources\<file name="docwrap.htm" mark="marked"/>
            data\resources\<file name="swfwrap.htm" mark="marked"/>
            </folder>
            </folder>
            <folder name="113604">
            <file name="1st e-lecture was conducted via MS Teams.html" mark="marked"/>
            <file name="Additional Help to View Lecture Video.html" mark="marked"/>
            <file name="Cover Page for Submission of Assignment.html" mark="marked"/>
            <folder name="E-learning Task 3 Orientation">
            lms\<file name="blank.html" mark="marked"/>
            lms\<file name="goodbye.html" mark="marked"/>
            presentation_content\<file name="blank.html" mark="marked"/>
            </folder>
            <folder name="E-learning Task 4  Industrial Relations (Part 2)">
            <file name="index.htm" mark="marked"/>
            <file name="SCORM.htm" mark="marked"/>
            data\resources\<file name="docwrap.htm" mark="marked"/>
            data\resources\<file name="swfwrap.htm" mark="marked"/>
            </folder>
            <folder name="Team Management Part 2">
            <file name="SCORM.htm" mark="marked"/>
            </folder>
            </folder>
            <folder name="113605">
            <file name="1. Introduction to Organisational Behaviour.html" mark="marked"/>
            <file name="Accessing Newspaper Articles.html" mark="marked"/>
            </folder>
            

            A third time, run these two regex S/R B, successively which results in :

            <folder name="113603">
            <folder name="Topic 8-Part 1 Industrial Relations  Legislation">
            <folder name="data">
            resources\<file name="docwrap.htm" mark="marked"/>
            resources\<file name="swfwrap.htm" mark="marked"/>
            </folder>
            </folder>
            </folder>
            <folder name="113604">
            <file name="1st e-lecture was conducted via MS Teams.html" mark="marked"/>
            <file name="Additional Help to View Lecture Video.html" mark="marked"/>
            <file name="Cover Page for Submission of Assignment.html" mark="marked"/>
            <folder name="E-learning Task 3 Orientation">
            <folder name="lms">
            <file name="blank.html" mark="marked"/>
            <file name="goodbye.html" mark="marked"/>
            </folder>
            <folder name="presentation_content">
            <file name="blank.html" mark="marked"/>
            </folder>
            </folder>
            <folder name="E-learning Task 4  Industrial Relations (Part 2)">
            <file name="index.htm" mark="marked"/>
            <file name="SCORM.htm" mark="marked"/>
            <folder name="data">
            resources\<file name="docwrap.htm" mark="marked"/>
            resources\<file name="swfwrap.htm" mark="marked"/>
            </folder>
            </folder>
            <folder name="Team Management Part 2">
            <file name="SCORM.htm" mark="marked"/>
            </folder>
            </folder>
            <folder name="113605">
            <file name="1. Introduction to Organisational Behaviour.html" mark="marked"/>
            <file name="Accessing Newspaper Articles.html" mark="marked"/>
            </folder>
            

            A fourth time, run these two regex S/R B, successively which gives :

            <folder name="113603">
            <folder name="Topic 8-Part 1 Industrial Relations  Legislation">
            <folder name="data">
            <folder name="resources">
            <file name="docwrap.htm" mark="marked"/>
            <file name="swfwrap.htm" mark="marked"/>
            </folder>
            </folder>
            </folder>
            </folder>
            <folder name="113604">
            <file name="1st e-lecture was conducted via MS Teams.html" mark="marked"/>
            <file name="Additional Help to View Lecture Video.html" mark="marked"/>
            <file name="Cover Page for Submission of Assignment.html" mark="marked"/>
            <folder name="E-learning Task 3 Orientation">
            <folder name="lms">
            <file name="blank.html" mark="marked"/>
            <file name="goodbye.html" mark="marked"/>
            </folder>
            <folder name="presentation_content">
            <file name="blank.html" mark="marked"/>
            </folder>
            </folder>
            <folder name="E-learning Task 4  Industrial Relations (Part 2)">
            <file name="index.htm" mark="marked"/>
            <file name="SCORM.htm" mark="marked"/>
            <folder name="data">
            <folder name="resources">
            <file name="docwrap.htm" mark="marked"/>
            <file name="swfwrap.htm" mark="marked"/>
            </folder>
            </folder>
            </folder>
            <folder name="Team Management Part 2">
            <file name="SCORM.htm" mark="marked"/>
            </folder>
            </folder>
            <folder name="113605">
            <file name="1. Introduction to Organisational Behaviour.html" mark="marked"/>
            <file name="Accessing Newspaper Articles.html" mark="marked"/>
            </folder>
            

            Well, you could say : How I know when the process is finished ? Easy : it’s when all the lines of the OUTPUT begin with a < character ! This is indeed the case after four executions of the couple of regexes B, in this example ! In order to easily determine when the process is complete and just needs final leading indentations :

            • Select the Find tab

              • SEARCH (?-s)^(?!<).+ ( Regex C )

              • Tick the Wrap around option

              • Click on the Count button

              • IF you get the message Count: O matches in entire file the process is finished

              • ELSE you need to execute the two regex B some more times !

            In my next post, I will explain how to automate all this process with macros

            Best Regards

            guy038

            1 Reply Last reply Reply Quote 2
            • guy038G
              guy038
              last edited by guy038

              Hi, @mohammad-hussain, @neil-schipper and All,

              Your active shortcuts.xml file should be located, either :

              • In the %appData%\Notepad++ folder, for a standard installation with the installer

              • In the Notepad++.exe folder, for a local installation

              Then :

              • Start Microsoft notepad.exe

              • Open the right Shortcuts.xml file

              • Insert, at the end of the <macros>...</macros> mode, the following macros Macro_A, Macro_B and Macro_C :

                      <Macro name="Macro_A" Ctrl="no" Alt="no" Shift="no" Key="0">
                          <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />
                          <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?-i)(?&lt;=\\)[^\\]+\.html?|(^P:\\|^\h*\R)" />
                          <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />
                          <Action type="3" message="1602" wParam="0" lParam="0" sParam='(?1:&lt;file name=&quot;$0&quot; mark=&quot;marked&quot;/&gt;)' />
                          <Action type="3" message="1702" wParam="0" lParam="768" sParam="" />
                          <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" />
                      </Macro>
                      <Macro name="Macro_B" Ctrl="no" Alt="no" Shift="no" Key="0">
                          <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />
                          <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?-is)^(.+?)\\.+\R(?:(?:(?!&lt;/).)+\R)*\1\\.+\R|^(.+?)\\&lt;file name.+\R" />
                          <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />
                          <Action type="3" message="1602" wParam="0" lParam="0" sParam='&lt;folder name=&quot;\1\2&quot;&gt;\r\n$0&lt;/folder&gt;\r\n' />
                          <Action type="3" message="1702" wParam="0" lParam="768" sParam="" />
                          <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" />
                          <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />
                          <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?-s)^(?!&lt;).+?\\(.+)" />
                          <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />
                          <Action type="3" message="1602" wParam="0" lParam="0" sParam="\1" />
                          <Action type="3" message="1702" wParam="0" lParam="768" sParam="" />
                          <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" />
                      </Macro>
                      <Macro name="Macro_C" Ctrl="no" Alt="no" Shift="no" Key="0">
                          <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />
                          <Action type="3" message="1601" wParam="0" lParam="0" sParam="^(?!&lt;).+" />
                          <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />
                          <Action type="3" message="1702" wParam="0" lParam="768" sParam="" />
                          <Action type="3" message="1701" wParam="0" lParam="1614" sParam="" />
                      </Macro>
              
              • Save the modifications

              • Now, stop and restart Notepad++

              • Open your INPUT file

              • First, run, once only, the Macro_A macro

              • Then, run several times the Macro_B macro

              • From time to time, run the Macro_C macro in order to verify if the process is terminated. Note that the Find dialog must be opened to get the results !


              Remark :

              You may assign a shortcut to each of these macros :

              • Use the Macro > Modify Shortcut/Delete Macro option

              • Double-click on each macro and enter your preferred shortcut

              Best Regards,

              guy038

              1 Reply Last reply Reply Quote 2
              • First post
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors