Community
    • Login

    Regex - replace subexpression in a repeated subexpression

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    5 Posts 3 Posters 1.1k Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Mohammad HussainM Offline
      Mohammad Hussain
      last edited by

      Hello,

      Before you start looking into this, kindly note that I found a program that does what I need. So right now I just want to learn, but this isn't urgent/important.

      I don’t know if I’m using the correct terminology here, or if this is possible at all, but here it is:

      I am dealing with converting lists of html files’ full paths to XML, as input for a program. example:

      P:\113603\Topic 8-Part 1 Industrial Relations  Legislation\data\resources\docwrap.htm
      P:\113603\Topic 8-Part 1 Industrial Relations  Legislation\data\resources\swfwrap.htm
      P:\113604\1st e-lecture was conducted via MS Teams.html
      P:\113604\Additional Help to View Lecture Video.html
      P:\113604\Cover Page for Submission of Assignment.html
      P:\113604\E-learning Task 3 Orientation\lms\blank.html
      P:\113604\E-learning Task 3 Orientation\lms\goodbye.html
      P:\113604\E-learning Task 3 Orientation\presentation_content\blank.html
      P:\113604\E-learning Task 4  Industrial Relations (Part 2)\index.htm
      P:\113604\E-learning Task 4  Industrial Relations (Part 2)\SCORM.htm
      P:\113604\E-learning Task 4  Industrial Relations (Part 2)\data\resources\docwrap.htm
      P:\113604\E-learning Task 4  Industrial Relations (Part 2)\data\resources\swfwrap.htm
      P:\113604\Team Management Part 2\SCORM.htm
      P:\113605\1. Introduction to Organisational Behaviour.html
      P:\113605\Accessing Newspaper Articles.html
      
      

      The expected final output is (includes indentation, which is not a problem for me with regex or with XML tools)

      				<folder name="113603">
      					<folder name="Topic 8-Part 1 Industrial Relations  Legislation">
      						<folder name="data">
      							<folder name="resources">
      								<folder name="docwrap.htm" mark="marked"/>
      								<folder name="swfwrap.htm" mark="marked"/>
      							</folder>
      						</folder>
      					</folder>
      				</folder>
      				<folder name="113604">
      					<folder name="1st e-lecture was conducted via MS Teams.html" mark="marked"/>
      					<folder name="Additional Help to View Lecture Video.html" mark="marked"/>
      					<folder name="Cover Page for Submission of Assignment.html" mark="marked"/>
      					<folder name="E-learning Task 3 Orientation">
      						<folder name="lms">
      							<folder name="blank.html" mark="marked"/>
      							<folder name="goodbye.html" mark="marked"/>
      						</folder>
      						<folder name="presentation_content">
      							<folder name="blank.html" mark="marked"/>
      						</folder>
      					</folder>
      					<folder name="E-learning Task 4  Industrial Relations (Part 2)">
      						<folder name="data">
      							<folder name="resources">
      								<folder name="docwrap.htm" mark="marked"/>
      								<folder name="swfwrap.htm" mark="marked"/>
      							</folder>
      						</folder>
      						<folder name="index.htm" mark="marked"/>
      						<folder name="SCORM.htm" mark="marked"/>
      					</folder>
      					<folder name="Team Management Part 2">
      						<folder name="SCORM.htm" mark="marked"/>
      					</folder>
      				</folder>
      				<folder name="113605">
      					<folder name="1. Introduction to Organisational Behaviour.html" mark="marked"/>
      					<folder name="Accessing Newspaper Articles.html" mark="marked"/>
      				</folder>
      

      From the input list of full paths, using the following steps:
      1. Search (simple):
      P:
      Replace:
      Nothing/Empty/null
      2. Search (regex):
      (\\)([^\\]{1,}\.htm[l]{0,1})
      Replace:
      <file name="\2" mark="marked"/>
      3. Search (regex):
      (\\)([^\\\r\n\<]{1,})(<.*>)
      Replace (keep replacing till no more search matches are found):
      <folder name="\2">\3</folder>

      I managed to convert the list of paths to XML, looking like this:

      <folder name="113603"><folder name="Topic 8-Part 1 Industrial Relations  Legislation"><folder name="data"><folder name="resources"><file name="docwrap.htm" mark="marked"/></folder></folder></folder></folder>
      <folder name="113603"><folder name="Topic 8-Part 1 Industrial Relations  Legislation"><folder name="data"><folder name="resources"><file name="swfwrap.htm" mark="marked"/></folder></folder></folder></folder>
      <folder name="113604"><file name="1st e-lecture was conducted via MS Teams.html" mark="marked"/></folder>
      <folder name="113604"><file name="Additional Help to View Lecture Video.html" mark="marked"/></folder>
      <folder name="113604"><file name="Cover Page for Submission of Assignment.html" mark="marked"/></folder>
      <folder name="113604"><folder name="E-learning Task 3 Orientation"><folder name="lms"><file name="blank.html" mark="marked"/></folder></folder></folder>
      <folder name="113604"><folder name="E-learning Task 3 Orientation"><folder name="lms"><file name="goodbye.html" mark="marked"/></folder></folder></folder>
      <folder name="113604"><folder name="E-learning Task 3 Orientation"><folder name="presentation_content"><file name="blank.html" mark="marked"/></folder></folder></folder>
      <folder name="113604"><folder name="E-learning Task 4  Industrial Relations (Part 2)"><file name="index.htm" mark="marked"/></folder></folder>
      <folder name="113604"><folder name="E-learning Task 4  Industrial Relations (Part 2)"><file name="SCORM.htm" mark="marked"/></folder></folder>
      <folder name="113604"><folder name="E-learning Task 4  Industrial Relations (Part 2)"><folder name="data"><folder name="resources"><file name="docwrap.htm" mark="marked"/></folder></folder></folder></folder>
      <folder name="113604"><folder name="E-learning Task 4  Industrial Relations (Part 2)"><folder name="data"><folder name="resources"><file name="swfwrap.htm" mark="marked"/></folder></folder></folder></folder>
      <folder name="113604"><folder name="Team Management Part 2"><file name="SCORM.htm" mark="marked"/></folder></folder>
      <folder name="113605"><file name="1. Introduction to Organisational Behaviour.html" mark="marked"/></folder>
      <folder name="113605"><file name="Accessing Newspaper Articles.html" mark="marked"/></folder>
      
      

      However, the tags for common parents are repeated. I want to eliminate that (not sure what this is called, minimize? grouping? something else?)

      I tried to create a regex search and replace for the highest level of folders/parents, with the intent of running the replace part multiple times for subfolders, until no more matches are found, removing all redundant tags:

      Search (regex, this part works):
      (^<folder name="[^"]{1,}">)(.*?)(</folder>\R)(((\1)(.*?)(\3))*)?(\1)
      Replace (doesn’t fully work. just showing you my progress and where I’m stuck):
      \1\2\r\n\4

      And the search captures what I’m looking for, and the replace allows me to remove the last occurrence of</folder> in the first matching line, and remove the parent folder tag in the last matching line. But how do I remove the redundant tags at the beginning and at the end for the lines in between?

      Right now, all the lines in-between are captured in group/subexpression \4. That way I can keep them when I do a replace. But to actually modify them in the same action, I need to remove groups 6 and 8 while keeping group 7, for each line, but my replace only works for the last line of multiple lines, instead of doing this for all the lines in-between.

      Is it even possible to make this work for all lines in-between, targeting individual groups (group 7) inside a repeated group/subexpression (group 4)? if yes, how?

      Thank you :)

      Neil SchipperN 1 Reply Last reply Reply Quote 0
      • Neil SchipperN Offline
        Neil Schipper @Mohammad Hussain
        last edited by

        @mohammad-hussain

        Hi. I don’t think you can capture a not-known-in-advance number of lines in such a way that you can write them back with substitutions.

        So an idiom that would help you go forward is to have a phase that captures groups of lines matching some left side text, and writing them back as a record with desired record start and record end lines, thus:

        Fi: ((^<[^>]+>)(.*\R)((\2)((.*\R)))*)
        Re: \2\r\n\1</folder>\r\n

        (Note: above doesn’t handle case where final text line doesn’t terminate with a newline, so add it manually first.)

        and then a phase to clean out that left side text, and, the trailing tag:

        Fi: ^<[^>]+>(.*?)</folder>\R
        Re: \1\r\n
        or maybe insert tabs (suggested by your sample output):
        Re: \t\t\t\t\1\r\n
        or maybe some number of spaces, as you prefer.

        Here’s output using 4 tabs:

        <folder name="113603">
        				<folder name="Topic 8-Part 1 Industrial Relations  Legislation"><folder name="data"><folder name="resources"><file name="docwrap.htm" mark="marked"/></folder></folder></folder>
        				<folder name="Topic 8-Part 1 Industrial Relations  Legislation"><folder name="data"><folder name="resources"><file name="swfwrap.htm" mark="marked"/></folder></folder></folder>
        </folder>
        <folder name="113604">
        				<file name="1st e-lecture was conducted via MS Teams.html" mark="marked"/>
        				<file name="Additional Help to View Lecture Video.html" mark="marked"/>
        				<file name="Cover Page for Submission of Assignment.html" mark="marked"/>
        				<folder name="E-learning Task 3 Orientation"><folder name="lms"><file name="blank.html" mark="marked"/></folder></folder>
        				<folder name="E-learning Task 3 Orientation"><folder name="lms"><file name="goodbye.html" mark="marked"/></folder></folder>
        				<folder name="E-learning Task 3 Orientation"><folder name="presentation_content"><file name="blank.html" mark="marked"/></folder></folder>
        				<folder name="E-learning Task 4  Industrial Relations (Part 2)"><file name="index.htm" mark="marked"/></folder>
        				<folder name="E-learning Task 4  Industrial Relations (Part 2)"><file name="SCORM.htm" mark="marked"/></folder>
        				<folder name="E-learning Task 4  Industrial Relations (Part 2)"><folder name="data"><folder name="resources"><file name="docwrap.htm" mark="marked"/></folder></folder></folder>
        				<folder name="E-learning Task 4  Industrial Relations (Part 2)"><folder name="data"><folder name="resources"><file name="swfwrap.htm" mark="marked"/></folder></folder></folder>
        				<folder name="Team Management Part 2"><file name="SCORM.htm" mark="marked"/></folder>
        </folder>
        <folder name="113605">
        				<file name="1. Introduction to Organisational Behaviour.html" mark="marked"/>
        				<file name="Accessing Newspaper Articles.html" mark="marked"/>
        </folder>
        
        

        You can now do something similar at the next level of nesting, matching your preferred whitespace after start-of-line.

        You also could take an approach like this on your original data, doing the factoring and maybe indenting early, and adding in the xml eye candy at a later phase.

        I won’t be surprised if there are more clever approaches, but I think this gets you unstuck.

        Mohammad HussainM 1 Reply Last reply Reply Quote 2
        • Mohammad HussainM Offline
          Mohammad Hussain @Neil Schipper
          last edited by

          @neil-schipper That is a smart approach! I guess I was trying to solve it in one go, when I could have done the same (or more) in more steps. Adding/repeating in order to isolate the parts to target next is a very good idea! Thank you :)

          I was kind of hoping you can reference groups in repeating groups though. But at least there are other solutions.

          Thank you again :)

          1 Reply Last reply Reply Quote 1
          • guy038G Online
            guy038
            last edited by guy038

            Hello, @mohammad-hussain, @neil-schipper and All,

            Before all, @mohammad-hussain, I think that the expected final OUTPUT, shown in your post, is a bit erroneous ! Indeed, your refer of files as, for instance :

            <folder name="docwrap.htm" mark="marked"/>
            

            To my mind, this should be labeled :

            <file name="docwrap.htm" mark="marked"/>
            

            as mentioned in your own regex S/R 2 !


            Now, I found out a way, with macros, to solve your problem ;-)) Briefly, for instance, from this INPUT text, below :

            P:\113603\Topic 8-Part 1 Industrial Relations  Legislation\data\resources\docwrap.htm
            
            
            P:\113603\Topic 8-Part 1 Industrial Relations  Legislation\data\resources\swfwrap.htm
            P:\113604\1st e-lecture was conducted via MS Teams.html
            
            
            P:\113604\Additional Help to View Lecture Video.html
            P:\113604\Cover Page for Submission of Assignment.html
            P:\113604\E-learning Task 3 Orientation\lms\blank.html
            P:\113604\E-learning Task 3 Orientation\lms\goodbye.html
            
            
            
            P:\113604\E-learning Task 3 Orientation\presentation_content\blank.html
            P:\113604\E-learning Task 4  Industrial Relations (Part 2)\index.htm
            
            P:\113604\E-learning Task 4  Industrial Relations (Part 2)\SCORM.htm
            P:\113604\E-learning Task 4  Industrial Relations (Part 2)\data\resources\docwrap.htm
            P:\113604\E-learning Task 4  Industrial Relations (Part 2)\data\resources\swfwrap.htm
            P:\113604\Team Management Part 2\SCORM.htm
            P:\113605\1. Introduction to Organisational Behaviour.html
            
            
            
            P:\113605\Accessing Newspaper Articles.html
            

            You would get this OUTPUT text :

            <folder name="113603">
            <folder name="Topic 8-Part 1 Industrial Relations  Legislation">
            <folder name="data">
            <folder name="resources">
            <file name="docwrap.htm" mark="marked"/>
            <file name="swfwrap.htm" mark="marked"/>
            </folder>
            </folder>
            </folder>
            </folder>
            <folder name="113604">
            <file name="1st e-lecture was conducted via MS Teams.html" mark="marked"/>
            <file name="Additional Help to View Lecture Video.html" mark="marked"/>
            <file name="Cover Page for Submission of Assignment.html" mark="marked"/>
            <folder name="E-learning Task 3 Orientation">
            <folder name="lms">
            <file name="blank.html" mark="marked"/>
            <file name="goodbye.html" mark="marked"/>
            </folder>
            <folder name="presentation_content">
            <file name="blank.html" mark="marked"/>
            </folder>
            </folder>
            <folder name="E-learning Task 4  Industrial Relations (Part 2)">
            <file name="index.htm" mark="marked"/>
            <file name="SCORM.htm" mark="marked"/>
            <folder name="data">
            <folder name="resources">
            <file name="docwrap.htm" mark="marked"/>
            <file name="swfwrap.htm" mark="marked"/>
            </folder>
            </folder>
            </folder>
            <folder name="Team Management Part 2">
            <file name="SCORM.htm" mark="marked"/>
            </folder>
            </folder>
            <folder name="113605">
            <file name="1. Introduction to Organisational Behaviour.html" mark="marked"/>
            <file name="Accessing Newspaper Articles.html" mark="marked"/>
            </folder>
            

            Note there will be a small difference between this OUTPUT and your expected output, as some lines, in the middle, are interchanged !

            With my solution, I have this partial layout :

            <folder name="E-learning Task 4  Industrial Relations (Part 2)">
            <file name="index.htm" mark="marked"/>
            <file name="SCORM.htm" mark="marked"/>
            <folder name="data">
            <folder name="resources">
            <file name="docwrap.htm" mark="marked"/>
            <file name="swfwrap.htm" mark="marked"/>
            </folder>
            </folder>
            </folder>
            

            In your expected final output, you have this partial layout :

            					<folder name="E-learning Task 4  Industrial Relations (Part 2)">
            						<folder name="data">
            							<folder name="resources">
            								<file name="docwrap.htm" mark="marked"/>
            								<file name="swfwrap.htm" mark="marked"/>
            							</folder>
            						</folder>
            						<file name="index.htm" mark="marked"/>
            						<file name="SCORM.htm" mark="marked"/>
            					</folder>
            

            However, these two syntaxes seem quite equivalent !


            Notes :

            • In this second part of my post, you don’t have to remember all the described regex S/R’s. Indeed, in a next post, I’ll give you the exact text to insert, in the <Macros> mode of your active shortcuts.xml file to automate the whole process ! However, in order to understand the principle used :

            • Open the Replace dialog ( Ctrl + H )

            • Then, for each regex S/R to run :

              • Fill in the search and replace regexes

              • Tick the Wrap around option

              • Select the Regular expression search mode

              • Click on the Replace All button

            Well, let’s go. So, given this initial text :

            P:\113603\Topic 8-Part 1 Industrial Relations  Legislation\data\resources\docwrap.htm
            
            
            P:\113603\Topic 8-Part 1 Industrial Relations  Legislation\data\resources\swfwrap.htm
            P:\113604\1st e-lecture was conducted via MS Teams.html
            
            
            P:\113604\Additional Help to View Lecture Video.html
            P:\113604\Cover Page for Submission of Assignment.html
            P:\113604\E-learning Task 3 Orientation\lms\blank.html
            P:\113604\E-learning Task 3 Orientation\lms\goodbye.html
            
            
            
            P:\113604\E-learning Task 3 Orientation\presentation_content\blank.html
            P:\113604\E-learning Task 4  Industrial Relations (Part 2)\index.htm
            
            P:\113604\E-learning Task 4  Industrial Relations (Part 2)\SCORM.htm
            P:\113604\E-learning Task 4  Industrial Relations (Part 2)\data\resources\docwrap.htm
            P:\113604\E-learning Task 4  Industrial Relations (Part 2)\data\resources\swfwrap.htm
            P:\113604\Team Management Part 2\SCORM.htm
            P:\113605\1. Introduction to Organisational Behaviour.html
            
            
            
            P:\113605\Accessing Newspaper Articles.html
            

            The regex S/R A :

            SEARCH (?-i)(?<=\\)[^\\]+\.html?|(^P:\\|^\h*\R)

            REPLACE (?1:<file name="$0" mark="marked"/>)

            gives :

            113603\Topic 8-Part 1 Industrial Relations  Legislation\data\resources\<file name="docwrap.htm" mark="marked"/>
            113603\Topic 8-Part 1 Industrial Relations  Legislation\data\resources\<file name="swfwrap.htm" mark="marked"/>
            113604\<file name="1st e-lecture was conducted via MS Teams.html" mark="marked"/>
            113604\<file name="Additional Help to View Lecture Video.html" mark="marked"/>
            113604\<file name="Cover Page for Submission of Assignment.html" mark="marked"/>
            113604\E-learning Task 3 Orientation\lms\<file name="blank.html" mark="marked"/>
            113604\E-learning Task 3 Orientation\lms\<file name="goodbye.html" mark="marked"/>
            113604\E-learning Task 3 Orientation\presentation_content\<file name="blank.html" mark="marked"/>
            113604\E-learning Task 4  Industrial Relations (Part 2)\<file name="index.htm" mark="marked"/>
            113604\E-learning Task 4  Industrial Relations (Part 2)\<file name="SCORM.htm" mark="marked"/>
            113604\E-learning Task 4  Industrial Relations (Part 2)\data\resources\<file name="docwrap.htm" mark="marked"/>
            113604\E-learning Task 4  Industrial Relations (Part 2)\data\resources\<file name="swfwrap.htm" mark="marked"/>
            113604\Team Management Part 2\<file name="SCORM.htm" mark="marked"/>
            113605\<file name="1. Introduction to Organisational Behaviour.html" mark="marked"/>
            113605\<file name="Accessing Newspaper Articles.html" mark="marked"/>
            

            Now, run the two next regex S/R B

            SEARCH ^(?-is)(.+?)\\.+\R(?:(?:(?!</).)+\R)*\1\\.+\R|^(.+?)\\<file name.+\R
            REPLACE <folder name="\1\2">\r\n$0</folder>\r\n

            then :

            SEARCH ^(?!<).+?\\(.+)
            REPLACE \1

            Which should give :

            <folder name="113603">
            Topic 8-Part 1 Industrial Relations  Legislation\data\resources\<file name="docwrap.htm" mark="marked"/>
            Topic 8-Part 1 Industrial Relations  Legislation\data\resources\<file name="swfwrap.htm" mark="marked"/>
            </folder>
            <folder name="113604">
            <file name="1st e-lecture was conducted via MS Teams.html" mark="marked"/>
            <file name="Additional Help to View Lecture Video.html" mark="marked"/>
            <file name="Cover Page for Submission of Assignment.html" mark="marked"/>
            E-learning Task 3 Orientation\lms\<file name="blank.html" mark="marked"/>
            E-learning Task 3 Orientation\lms\<file name="goodbye.html" mark="marked"/>
            E-learning Task 3 Orientation\presentation_content\<file name="blank.html" mark="marked"/>
            E-learning Task 4  Industrial Relations (Part 2)\<file name="index.htm" mark="marked"/>
            E-learning Task 4  Industrial Relations (Part 2)\<file name="SCORM.htm" mark="marked"/>
            E-learning Task 4  Industrial Relations (Part 2)\data\resources\<file name="docwrap.htm" mark="marked"/>
            E-learning Task 4  Industrial Relations (Part 2)\data\resources\<file name="swfwrap.htm" mark="marked"/>
            Team Management Part 2\<file name="SCORM.htm" mark="marked"/>
            </folder>
            <folder name="113605">
            <file name="1. Introduction to Organisational Behaviour.html" mark="marked"/>
            <file name="Accessing Newspaper Articles.html" mark="marked"/>
            </folder>
            

            Then, again, run these two regex S/R B, successively, in order to get :

            <folder name="113603">
            <folder name="Topic 8-Part 1 Industrial Relations  Legislation">
            data\resources\<file name="docwrap.htm" mark="marked"/>
            data\resources\<file name="swfwrap.htm" mark="marked"/>
            </folder>
            </folder>
            <folder name="113604">
            <file name="1st e-lecture was conducted via MS Teams.html" mark="marked"/>
            <file name="Additional Help to View Lecture Video.html" mark="marked"/>
            <file name="Cover Page for Submission of Assignment.html" mark="marked"/>
            <folder name="E-learning Task 3 Orientation">
            lms\<file name="blank.html" mark="marked"/>
            lms\<file name="goodbye.html" mark="marked"/>
            presentation_content\<file name="blank.html" mark="marked"/>
            </folder>
            <folder name="E-learning Task 4  Industrial Relations (Part 2)">
            <file name="index.htm" mark="marked"/>
            <file name="SCORM.htm" mark="marked"/>
            data\resources\<file name="docwrap.htm" mark="marked"/>
            data\resources\<file name="swfwrap.htm" mark="marked"/>
            </folder>
            <folder name="Team Management Part 2">
            <file name="SCORM.htm" mark="marked"/>
            </folder>
            </folder>
            <folder name="113605">
            <file name="1. Introduction to Organisational Behaviour.html" mark="marked"/>
            <file name="Accessing Newspaper Articles.html" mark="marked"/>
            </folder>
            

            A third time, run these two regex S/R B, successively which results in :

            <folder name="113603">
            <folder name="Topic 8-Part 1 Industrial Relations  Legislation">
            <folder name="data">
            resources\<file name="docwrap.htm" mark="marked"/>
            resources\<file name="swfwrap.htm" mark="marked"/>
            </folder>
            </folder>
            </folder>
            <folder name="113604">
            <file name="1st e-lecture was conducted via MS Teams.html" mark="marked"/>
            <file name="Additional Help to View Lecture Video.html" mark="marked"/>
            <file name="Cover Page for Submission of Assignment.html" mark="marked"/>
            <folder name="E-learning Task 3 Orientation">
            <folder name="lms">
            <file name="blank.html" mark="marked"/>
            <file name="goodbye.html" mark="marked"/>
            </folder>
            <folder name="presentation_content">
            <file name="blank.html" mark="marked"/>
            </folder>
            </folder>
            <folder name="E-learning Task 4  Industrial Relations (Part 2)">
            <file name="index.htm" mark="marked"/>
            <file name="SCORM.htm" mark="marked"/>
            <folder name="data">
            resources\<file name="docwrap.htm" mark="marked"/>
            resources\<file name="swfwrap.htm" mark="marked"/>
            </folder>
            </folder>
            <folder name="Team Management Part 2">
            <file name="SCORM.htm" mark="marked"/>
            </folder>
            </folder>
            <folder name="113605">
            <file name="1. Introduction to Organisational Behaviour.html" mark="marked"/>
            <file name="Accessing Newspaper Articles.html" mark="marked"/>
            </folder>
            

            A fourth time, run these two regex S/R B, successively which gives :

            <folder name="113603">
            <folder name="Topic 8-Part 1 Industrial Relations  Legislation">
            <folder name="data">
            <folder name="resources">
            <file name="docwrap.htm" mark="marked"/>
            <file name="swfwrap.htm" mark="marked"/>
            </folder>
            </folder>
            </folder>
            </folder>
            <folder name="113604">
            <file name="1st e-lecture was conducted via MS Teams.html" mark="marked"/>
            <file name="Additional Help to View Lecture Video.html" mark="marked"/>
            <file name="Cover Page for Submission of Assignment.html" mark="marked"/>
            <folder name="E-learning Task 3 Orientation">
            <folder name="lms">
            <file name="blank.html" mark="marked"/>
            <file name="goodbye.html" mark="marked"/>
            </folder>
            <folder name="presentation_content">
            <file name="blank.html" mark="marked"/>
            </folder>
            </folder>
            <folder name="E-learning Task 4  Industrial Relations (Part 2)">
            <file name="index.htm" mark="marked"/>
            <file name="SCORM.htm" mark="marked"/>
            <folder name="data">
            <folder name="resources">
            <file name="docwrap.htm" mark="marked"/>
            <file name="swfwrap.htm" mark="marked"/>
            </folder>
            </folder>
            </folder>
            <folder name="Team Management Part 2">
            <file name="SCORM.htm" mark="marked"/>
            </folder>
            </folder>
            <folder name="113605">
            <file name="1. Introduction to Organisational Behaviour.html" mark="marked"/>
            <file name="Accessing Newspaper Articles.html" mark="marked"/>
            </folder>
            

            Well, you could say : How I know when the process is finished ? Easy : it’s when all the lines of the OUTPUT begin with a < character ! This is indeed the case after four executions of the couple of regexes B, in this example ! In order to easily determine when the process is complete and just needs final leading indentations :

            • Select the Find tab

              • SEARCH (?-s)^(?!<).+ ( Regex C )

              • Tick the Wrap around option

              • Click on the Count button

              • IF you get the message Count: O matches in entire file the process is finished

              • ELSE you need to execute the two regex B some more times !

            In my next post, I will explain how to automate all this process with macros

            Best Regards

            guy038

            1 Reply Last reply Reply Quote 2
            • guy038G Online
              guy038
              last edited by guy038

              Hi, @mohammad-hussain, @neil-schipper and All,

              Your active shortcuts.xml file should be located, either :

              • In the %appData%\Notepad++ folder, for a standard installation with the installer

              • In the Notepad++.exe folder, for a local installation

              Then :

              • Start Microsoft notepad.exe

              • Open the right Shortcuts.xml file

              • Insert, at the end of the <macros>...</macros> mode, the following macros Macro_A, Macro_B and Macro_C :

                      <Macro name="Macro_A" Ctrl="no" Alt="no" Shift="no" Key="0">
                          <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />
                          <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?-i)(?&lt;=\\)[^\\]+\.html?|(^P:\\|^\h*\R)" />
                          <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />
                          <Action type="3" message="1602" wParam="0" lParam="0" sParam='(?1:&lt;file name=&quot;$0&quot; mark=&quot;marked&quot;/&gt;)' />
                          <Action type="3" message="1702" wParam="0" lParam="768" sParam="" />
                          <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" />
                      </Macro>
                      <Macro name="Macro_B" Ctrl="no" Alt="no" Shift="no" Key="0">
                          <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />
                          <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?-is)^(.+?)\\.+\R(?:(?:(?!&lt;/).)+\R)*\1\\.+\R|^(.+?)\\&lt;file name.+\R" />
                          <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />
                          <Action type="3" message="1602" wParam="0" lParam="0" sParam='&lt;folder name=&quot;\1\2&quot;&gt;\r\n$0&lt;/folder&gt;\r\n' />
                          <Action type="3" message="1702" wParam="0" lParam="768" sParam="" />
                          <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" />
                          <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />
                          <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?-s)^(?!&lt;).+?\\(.+)" />
                          <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />
                          <Action type="3" message="1602" wParam="0" lParam="0" sParam="\1" />
                          <Action type="3" message="1702" wParam="0" lParam="768" sParam="" />
                          <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" />
                      </Macro>
                      <Macro name="Macro_C" Ctrl="no" Alt="no" Shift="no" Key="0">
                          <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />
                          <Action type="3" message="1601" wParam="0" lParam="0" sParam="^(?!&lt;).+" />
                          <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />
                          <Action type="3" message="1702" wParam="0" lParam="768" sParam="" />
                          <Action type="3" message="1701" wParam="0" lParam="1614" sParam="" />
                      </Macro>
              
              • Save the modifications

              • Now, stop and restart Notepad++

              • Open your INPUT file

              • First, run, once only, the Macro_A macro

              • Then, run several times the Macro_B macro

              • From time to time, run the Macro_C macro in order to verify if the process is terminated. Note that the Find dialog must be opened to get the results !


              Remark :

              You may assign a shortcut to each of these macros :

              • Use the Macro > Modify Shortcut/Delete Macro option

              • Double-click on each macro and enter your preferred shortcut

              Best Regards,

              guy038

              1 Reply Last reply Reply Quote 2

              Hello! It looks like you're interested in this conversation, but you don't have an account yet.

              Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

              With your input, this post could be even better 💗

              Register Login
              • First post
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors