• Login
Community
  • Login

Regex - replace subexpression in a repeated subexpression

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
5 Posts 3 Posters 530 Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M
    Mohammad Hussain
    last edited by Jun 7, 2022, 3:57 AM

    Hello,

    Before you start looking into this, kindly note that I found a program that does what I need. So right now I just want to learn, but this isn't urgent/important.

    I don’t know if I’m using the correct terminology here, or if this is possible at all, but here it is:

    I am dealing with converting lists of html files’ full paths to XML, as input for a program. example:

    P:\113603\Topic 8-Part 1 Industrial Relations  Legislation\data\resources\docwrap.htm
    P:\113603\Topic 8-Part 1 Industrial Relations  Legislation\data\resources\swfwrap.htm
    P:\113604\1st e-lecture was conducted via MS Teams.html
    P:\113604\Additional Help to View Lecture Video.html
    P:\113604\Cover Page for Submission of Assignment.html
    P:\113604\E-learning Task 3 Orientation\lms\blank.html
    P:\113604\E-learning Task 3 Orientation\lms\goodbye.html
    P:\113604\E-learning Task 3 Orientation\presentation_content\blank.html
    P:\113604\E-learning Task 4  Industrial Relations (Part 2)\index.htm
    P:\113604\E-learning Task 4  Industrial Relations (Part 2)\SCORM.htm
    P:\113604\E-learning Task 4  Industrial Relations (Part 2)\data\resources\docwrap.htm
    P:\113604\E-learning Task 4  Industrial Relations (Part 2)\data\resources\swfwrap.htm
    P:\113604\Team Management Part 2\SCORM.htm
    P:\113605\1. Introduction to Organisational Behaviour.html
    P:\113605\Accessing Newspaper Articles.html
    
    

    The expected final output is (includes indentation, which is not a problem for me with regex or with XML tools)

    				<folder name="113603">
    					<folder name="Topic 8-Part 1 Industrial Relations  Legislation">
    						<folder name="data">
    							<folder name="resources">
    								<folder name="docwrap.htm" mark="marked"/>
    								<folder name="swfwrap.htm" mark="marked"/>
    							</folder>
    						</folder>
    					</folder>
    				</folder>
    				<folder name="113604">
    					<folder name="1st e-lecture was conducted via MS Teams.html" mark="marked"/>
    					<folder name="Additional Help to View Lecture Video.html" mark="marked"/>
    					<folder name="Cover Page for Submission of Assignment.html" mark="marked"/>
    					<folder name="E-learning Task 3 Orientation">
    						<folder name="lms">
    							<folder name="blank.html" mark="marked"/>
    							<folder name="goodbye.html" mark="marked"/>
    						</folder>
    						<folder name="presentation_content">
    							<folder name="blank.html" mark="marked"/>
    						</folder>
    					</folder>
    					<folder name="E-learning Task 4  Industrial Relations (Part 2)">
    						<folder name="data">
    							<folder name="resources">
    								<folder name="docwrap.htm" mark="marked"/>
    								<folder name="swfwrap.htm" mark="marked"/>
    							</folder>
    						</folder>
    						<folder name="index.htm" mark="marked"/>
    						<folder name="SCORM.htm" mark="marked"/>
    					</folder>
    					<folder name="Team Management Part 2">
    						<folder name="SCORM.htm" mark="marked"/>
    					</folder>
    				</folder>
    				<folder name="113605">
    					<folder name="1. Introduction to Organisational Behaviour.html" mark="marked"/>
    					<folder name="Accessing Newspaper Articles.html" mark="marked"/>
    				</folder>
    

    From the input list of full paths, using the following steps:
    1. Search (simple):
    P:
    Replace:
    Nothing/Empty/null
    2. Search (regex):
    (\\)([^\\]{1,}\.htm[l]{0,1})
    Replace:
    <file name="\2" mark="marked"/>
    3. Search (regex):
    (\\)([^\\\r\n\<]{1,})(<.*>)
    Replace (keep replacing till no more search matches are found):
    <folder name="\2">\3</folder>

    I managed to convert the list of paths to XML, looking like this:

    <folder name="113603"><folder name="Topic 8-Part 1 Industrial Relations  Legislation"><folder name="data"><folder name="resources"><file name="docwrap.htm" mark="marked"/></folder></folder></folder></folder>
    <folder name="113603"><folder name="Topic 8-Part 1 Industrial Relations  Legislation"><folder name="data"><folder name="resources"><file name="swfwrap.htm" mark="marked"/></folder></folder></folder></folder>
    <folder name="113604"><file name="1st e-lecture was conducted via MS Teams.html" mark="marked"/></folder>
    <folder name="113604"><file name="Additional Help to View Lecture Video.html" mark="marked"/></folder>
    <folder name="113604"><file name="Cover Page for Submission of Assignment.html" mark="marked"/></folder>
    <folder name="113604"><folder name="E-learning Task 3 Orientation"><folder name="lms"><file name="blank.html" mark="marked"/></folder></folder></folder>
    <folder name="113604"><folder name="E-learning Task 3 Orientation"><folder name="lms"><file name="goodbye.html" mark="marked"/></folder></folder></folder>
    <folder name="113604"><folder name="E-learning Task 3 Orientation"><folder name="presentation_content"><file name="blank.html" mark="marked"/></folder></folder></folder>
    <folder name="113604"><folder name="E-learning Task 4  Industrial Relations (Part 2)"><file name="index.htm" mark="marked"/></folder></folder>
    <folder name="113604"><folder name="E-learning Task 4  Industrial Relations (Part 2)"><file name="SCORM.htm" mark="marked"/></folder></folder>
    <folder name="113604"><folder name="E-learning Task 4  Industrial Relations (Part 2)"><folder name="data"><folder name="resources"><file name="docwrap.htm" mark="marked"/></folder></folder></folder></folder>
    <folder name="113604"><folder name="E-learning Task 4  Industrial Relations (Part 2)"><folder name="data"><folder name="resources"><file name="swfwrap.htm" mark="marked"/></folder></folder></folder></folder>
    <folder name="113604"><folder name="Team Management Part 2"><file name="SCORM.htm" mark="marked"/></folder></folder>
    <folder name="113605"><file name="1. Introduction to Organisational Behaviour.html" mark="marked"/></folder>
    <folder name="113605"><file name="Accessing Newspaper Articles.html" mark="marked"/></folder>
    
    

    However, the tags for common parents are repeated. I want to eliminate that (not sure what this is called, minimize? grouping? something else?)

    I tried to create a regex search and replace for the highest level of folders/parents, with the intent of running the replace part multiple times for subfolders, until no more matches are found, removing all redundant tags:

    Search (regex, this part works):
    (^<folder name="[^"]{1,}">)(.*?)(</folder>\R)(((\1)(.*?)(\3))*)?(\1)
    Replace (doesn’t fully work. just showing you my progress and where I’m stuck):
    \1\2\r\n\4

    And the search captures what I’m looking for, and the replace allows me to remove the last occurrence of</folder> in the first matching line, and remove the parent folder tag in the last matching line. But how do I remove the redundant tags at the beginning and at the end for the lines in between?

    Right now, all the lines in-between are captured in group/subexpression \4. That way I can keep them when I do a replace. But to actually modify them in the same action, I need to remove groups 6 and 8 while keeping group 7, for each line, but my replace only works for the last line of multiple lines, instead of doing this for all the lines in-between.

    Is it even possible to make this work for all lines in-between, targeting individual groups (group 7) inside a repeated group/subexpression (group 4)? if yes, how?

    Thank you :)

    N 1 Reply Last reply Jun 7, 2022, 11:17 AM Reply Quote 0
    • N
      Neil Schipper @Mohammad Hussain
      last edited by Jun 7, 2022, 11:17 AM

      @mohammad-hussain

      Hi. I don’t think you can capture a not-known-in-advance number of lines in such a way that you can write them back with substitutions.

      So an idiom that would help you go forward is to have a phase that captures groups of lines matching some left side text, and writing them back as a record with desired record start and record end lines, thus:

      Fi: ((^<[^>]+>)(.*\R)((\2)((.*\R)))*)
      Re: \2\r\n\1</folder>\r\n

      (Note: above doesn’t handle case where final text line doesn’t terminate with a newline, so add it manually first.)

      and then a phase to clean out that left side text, and, the trailing tag:

      Fi: ^<[^>]+>(.*?)</folder>\R
      Re: \1\r\n
      or maybe insert tabs (suggested by your sample output):
      Re: \t\t\t\t\1\r\n
      or maybe some number of spaces, as you prefer.

      Here’s output using 4 tabs:

      <folder name="113603">
      				<folder name="Topic 8-Part 1 Industrial Relations  Legislation"><folder name="data"><folder name="resources"><file name="docwrap.htm" mark="marked"/></folder></folder></folder>
      				<folder name="Topic 8-Part 1 Industrial Relations  Legislation"><folder name="data"><folder name="resources"><file name="swfwrap.htm" mark="marked"/></folder></folder></folder>
      </folder>
      <folder name="113604">
      				<file name="1st e-lecture was conducted via MS Teams.html" mark="marked"/>
      				<file name="Additional Help to View Lecture Video.html" mark="marked"/>
      				<file name="Cover Page for Submission of Assignment.html" mark="marked"/>
      				<folder name="E-learning Task 3 Orientation"><folder name="lms"><file name="blank.html" mark="marked"/></folder></folder>
      				<folder name="E-learning Task 3 Orientation"><folder name="lms"><file name="goodbye.html" mark="marked"/></folder></folder>
      				<folder name="E-learning Task 3 Orientation"><folder name="presentation_content"><file name="blank.html" mark="marked"/></folder></folder>
      				<folder name="E-learning Task 4  Industrial Relations (Part 2)"><file name="index.htm" mark="marked"/></folder>
      				<folder name="E-learning Task 4  Industrial Relations (Part 2)"><file name="SCORM.htm" mark="marked"/></folder>
      				<folder name="E-learning Task 4  Industrial Relations (Part 2)"><folder name="data"><folder name="resources"><file name="docwrap.htm" mark="marked"/></folder></folder></folder>
      				<folder name="E-learning Task 4  Industrial Relations (Part 2)"><folder name="data"><folder name="resources"><file name="swfwrap.htm" mark="marked"/></folder></folder></folder>
      				<folder name="Team Management Part 2"><file name="SCORM.htm" mark="marked"/></folder>
      </folder>
      <folder name="113605">
      				<file name="1. Introduction to Organisational Behaviour.html" mark="marked"/>
      				<file name="Accessing Newspaper Articles.html" mark="marked"/>
      </folder>
      
      

      You can now do something similar at the next level of nesting, matching your preferred whitespace after start-of-line.

      You also could take an approach like this on your original data, doing the factoring and maybe indenting early, and adding in the xml eye candy at a later phase.

      I won’t be surprised if there are more clever approaches, but I think this gets you unstuck.

      M 1 Reply Last reply Jun 8, 2022, 5:43 AM Reply Quote 2
      • M
        Mohammad Hussain @Neil Schipper
        last edited by Jun 8, 2022, 5:43 AM

        @neil-schipper That is a smart approach! I guess I was trying to solve it in one go, when I could have done the same (or more) in more steps. Adding/repeating in order to isolate the parts to target next is a very good idea! Thank you :)

        I was kind of hoping you can reference groups in repeating groups though. But at least there are other solutions.

        Thank you again :)

        1 Reply Last reply Reply Quote 1
        • guy038G
          guy038
          last edited by guy038 Jun 12, 2022, 9:56 AM Jun 9, 2022, 1:44 PM

          Hello, @mohammad-hussain, @neil-schipper and All,

          Before all, @mohammad-hussain, I think that the expected final OUTPUT, shown in your post, is a bit erroneous ! Indeed, your refer of files as, for instance :

          <folder name="docwrap.htm" mark="marked"/>
          

          To my mind, this should be labeled :

          <file name="docwrap.htm" mark="marked"/>
          

          as mentioned in your own regex S/R 2 !


          Now, I found out a way, with macros, to solve your problem ;-)) Briefly, for instance, from this INPUT text, below :

          P:\113603\Topic 8-Part 1 Industrial Relations  Legislation\data\resources\docwrap.htm
          
          
          P:\113603\Topic 8-Part 1 Industrial Relations  Legislation\data\resources\swfwrap.htm
          P:\113604\1st e-lecture was conducted via MS Teams.html
          
          
          P:\113604\Additional Help to View Lecture Video.html
          P:\113604\Cover Page for Submission of Assignment.html
          P:\113604\E-learning Task 3 Orientation\lms\blank.html
          P:\113604\E-learning Task 3 Orientation\lms\goodbye.html
          
          
          
          P:\113604\E-learning Task 3 Orientation\presentation_content\blank.html
          P:\113604\E-learning Task 4  Industrial Relations (Part 2)\index.htm
          
          P:\113604\E-learning Task 4  Industrial Relations (Part 2)\SCORM.htm
          P:\113604\E-learning Task 4  Industrial Relations (Part 2)\data\resources\docwrap.htm
          P:\113604\E-learning Task 4  Industrial Relations (Part 2)\data\resources\swfwrap.htm
          P:\113604\Team Management Part 2\SCORM.htm
          P:\113605\1. Introduction to Organisational Behaviour.html
          
          
          
          P:\113605\Accessing Newspaper Articles.html
          

          You would get this OUTPUT text :

          <folder name="113603">
          <folder name="Topic 8-Part 1 Industrial Relations  Legislation">
          <folder name="data">
          <folder name="resources">
          <file name="docwrap.htm" mark="marked"/>
          <file name="swfwrap.htm" mark="marked"/>
          </folder>
          </folder>
          </folder>
          </folder>
          <folder name="113604">
          <file name="1st e-lecture was conducted via MS Teams.html" mark="marked"/>
          <file name="Additional Help to View Lecture Video.html" mark="marked"/>
          <file name="Cover Page for Submission of Assignment.html" mark="marked"/>
          <folder name="E-learning Task 3 Orientation">
          <folder name="lms">
          <file name="blank.html" mark="marked"/>
          <file name="goodbye.html" mark="marked"/>
          </folder>
          <folder name="presentation_content">
          <file name="blank.html" mark="marked"/>
          </folder>
          </folder>
          <folder name="E-learning Task 4  Industrial Relations (Part 2)">
          <file name="index.htm" mark="marked"/>
          <file name="SCORM.htm" mark="marked"/>
          <folder name="data">
          <folder name="resources">
          <file name="docwrap.htm" mark="marked"/>
          <file name="swfwrap.htm" mark="marked"/>
          </folder>
          </folder>
          </folder>
          <folder name="Team Management Part 2">
          <file name="SCORM.htm" mark="marked"/>
          </folder>
          </folder>
          <folder name="113605">
          <file name="1. Introduction to Organisational Behaviour.html" mark="marked"/>
          <file name="Accessing Newspaper Articles.html" mark="marked"/>
          </folder>
          

          Note there will be a small difference between this OUTPUT and your expected output, as some lines, in the middle, are interchanged !

          With my solution, I have this partial layout :

          <folder name="E-learning Task 4  Industrial Relations (Part 2)">
          <file name="index.htm" mark="marked"/>
          <file name="SCORM.htm" mark="marked"/>
          <folder name="data">
          <folder name="resources">
          <file name="docwrap.htm" mark="marked"/>
          <file name="swfwrap.htm" mark="marked"/>
          </folder>
          </folder>
          </folder>
          

          In your expected final output, you have this partial layout :

          					<folder name="E-learning Task 4  Industrial Relations (Part 2)">
          						<folder name="data">
          							<folder name="resources">
          								<file name="docwrap.htm" mark="marked"/>
          								<file name="swfwrap.htm" mark="marked"/>
          							</folder>
          						</folder>
          						<file name="index.htm" mark="marked"/>
          						<file name="SCORM.htm" mark="marked"/>
          					</folder>
          

          However, these two syntaxes seem quite equivalent !


          Notes :

          • In this second part of my post, you don’t have to remember all the described regex S/R’s. Indeed, in a next post, I’ll give you the exact text to insert, in the <Macros> mode of your active shortcuts.xml file to automate the whole process ! However, in order to understand the principle used :

          • Open the Replace dialog ( Ctrl + H )

          • Then, for each regex S/R to run :

            • Fill in the search and replace regexes

            • Tick the Wrap around option

            • Select the Regular expression search mode

            • Click on the Replace All button

          Well, let’s go. So, given this initial text :

          P:\113603\Topic 8-Part 1 Industrial Relations  Legislation\data\resources\docwrap.htm
          
          
          P:\113603\Topic 8-Part 1 Industrial Relations  Legislation\data\resources\swfwrap.htm
          P:\113604\1st e-lecture was conducted via MS Teams.html
          
          
          P:\113604\Additional Help to View Lecture Video.html
          P:\113604\Cover Page for Submission of Assignment.html
          P:\113604\E-learning Task 3 Orientation\lms\blank.html
          P:\113604\E-learning Task 3 Orientation\lms\goodbye.html
          
          
          
          P:\113604\E-learning Task 3 Orientation\presentation_content\blank.html
          P:\113604\E-learning Task 4  Industrial Relations (Part 2)\index.htm
          
          P:\113604\E-learning Task 4  Industrial Relations (Part 2)\SCORM.htm
          P:\113604\E-learning Task 4  Industrial Relations (Part 2)\data\resources\docwrap.htm
          P:\113604\E-learning Task 4  Industrial Relations (Part 2)\data\resources\swfwrap.htm
          P:\113604\Team Management Part 2\SCORM.htm
          P:\113605\1. Introduction to Organisational Behaviour.html
          
          
          
          P:\113605\Accessing Newspaper Articles.html
          

          The regex S/R A :

          SEARCH (?-i)(?<=\\)[^\\]+\.html?|(^P:\\|^\h*\R)

          REPLACE (?1:<file name="$0" mark="marked"/>)

          gives :

          113603\Topic 8-Part 1 Industrial Relations  Legislation\data\resources\<file name="docwrap.htm" mark="marked"/>
          113603\Topic 8-Part 1 Industrial Relations  Legislation\data\resources\<file name="swfwrap.htm" mark="marked"/>
          113604\<file name="1st e-lecture was conducted via MS Teams.html" mark="marked"/>
          113604\<file name="Additional Help to View Lecture Video.html" mark="marked"/>
          113604\<file name="Cover Page for Submission of Assignment.html" mark="marked"/>
          113604\E-learning Task 3 Orientation\lms\<file name="blank.html" mark="marked"/>
          113604\E-learning Task 3 Orientation\lms\<file name="goodbye.html" mark="marked"/>
          113604\E-learning Task 3 Orientation\presentation_content\<file name="blank.html" mark="marked"/>
          113604\E-learning Task 4  Industrial Relations (Part 2)\<file name="index.htm" mark="marked"/>
          113604\E-learning Task 4  Industrial Relations (Part 2)\<file name="SCORM.htm" mark="marked"/>
          113604\E-learning Task 4  Industrial Relations (Part 2)\data\resources\<file name="docwrap.htm" mark="marked"/>
          113604\E-learning Task 4  Industrial Relations (Part 2)\data\resources\<file name="swfwrap.htm" mark="marked"/>
          113604\Team Management Part 2\<file name="SCORM.htm" mark="marked"/>
          113605\<file name="1. Introduction to Organisational Behaviour.html" mark="marked"/>
          113605\<file name="Accessing Newspaper Articles.html" mark="marked"/>
          

          Now, run the two next regex S/R B

          SEARCH ^(?-is)(.+?)\\.+\R(?:(?:(?!</).)+\R)*\1\\.+\R|^(.+?)\\<file name.+\R
          REPLACE <folder name="\1\2">\r\n$0</folder>\r\n

          then :

          SEARCH ^(?!<).+?\\(.+)
          REPLACE \1

          Which should give :

          <folder name="113603">
          Topic 8-Part 1 Industrial Relations  Legislation\data\resources\<file name="docwrap.htm" mark="marked"/>
          Topic 8-Part 1 Industrial Relations  Legislation\data\resources\<file name="swfwrap.htm" mark="marked"/>
          </folder>
          <folder name="113604">
          <file name="1st e-lecture was conducted via MS Teams.html" mark="marked"/>
          <file name="Additional Help to View Lecture Video.html" mark="marked"/>
          <file name="Cover Page for Submission of Assignment.html" mark="marked"/>
          E-learning Task 3 Orientation\lms\<file name="blank.html" mark="marked"/>
          E-learning Task 3 Orientation\lms\<file name="goodbye.html" mark="marked"/>
          E-learning Task 3 Orientation\presentation_content\<file name="blank.html" mark="marked"/>
          E-learning Task 4  Industrial Relations (Part 2)\<file name="index.htm" mark="marked"/>
          E-learning Task 4  Industrial Relations (Part 2)\<file name="SCORM.htm" mark="marked"/>
          E-learning Task 4  Industrial Relations (Part 2)\data\resources\<file name="docwrap.htm" mark="marked"/>
          E-learning Task 4  Industrial Relations (Part 2)\data\resources\<file name="swfwrap.htm" mark="marked"/>
          Team Management Part 2\<file name="SCORM.htm" mark="marked"/>
          </folder>
          <folder name="113605">
          <file name="1. Introduction to Organisational Behaviour.html" mark="marked"/>
          <file name="Accessing Newspaper Articles.html" mark="marked"/>
          </folder>
          

          Then, again, run these two regex S/R B, successively, in order to get :

          <folder name="113603">
          <folder name="Topic 8-Part 1 Industrial Relations  Legislation">
          data\resources\<file name="docwrap.htm" mark="marked"/>
          data\resources\<file name="swfwrap.htm" mark="marked"/>
          </folder>
          </folder>
          <folder name="113604">
          <file name="1st e-lecture was conducted via MS Teams.html" mark="marked"/>
          <file name="Additional Help to View Lecture Video.html" mark="marked"/>
          <file name="Cover Page for Submission of Assignment.html" mark="marked"/>
          <folder name="E-learning Task 3 Orientation">
          lms\<file name="blank.html" mark="marked"/>
          lms\<file name="goodbye.html" mark="marked"/>
          presentation_content\<file name="blank.html" mark="marked"/>
          </folder>
          <folder name="E-learning Task 4  Industrial Relations (Part 2)">
          <file name="index.htm" mark="marked"/>
          <file name="SCORM.htm" mark="marked"/>
          data\resources\<file name="docwrap.htm" mark="marked"/>
          data\resources\<file name="swfwrap.htm" mark="marked"/>
          </folder>
          <folder name="Team Management Part 2">
          <file name="SCORM.htm" mark="marked"/>
          </folder>
          </folder>
          <folder name="113605">
          <file name="1. Introduction to Organisational Behaviour.html" mark="marked"/>
          <file name="Accessing Newspaper Articles.html" mark="marked"/>
          </folder>
          

          A third time, run these two regex S/R B, successively which results in :

          <folder name="113603">
          <folder name="Topic 8-Part 1 Industrial Relations  Legislation">
          <folder name="data">
          resources\<file name="docwrap.htm" mark="marked"/>
          resources\<file name="swfwrap.htm" mark="marked"/>
          </folder>
          </folder>
          </folder>
          <folder name="113604">
          <file name="1st e-lecture was conducted via MS Teams.html" mark="marked"/>
          <file name="Additional Help to View Lecture Video.html" mark="marked"/>
          <file name="Cover Page for Submission of Assignment.html" mark="marked"/>
          <folder name="E-learning Task 3 Orientation">
          <folder name="lms">
          <file name="blank.html" mark="marked"/>
          <file name="goodbye.html" mark="marked"/>
          </folder>
          <folder name="presentation_content">
          <file name="blank.html" mark="marked"/>
          </folder>
          </folder>
          <folder name="E-learning Task 4  Industrial Relations (Part 2)">
          <file name="index.htm" mark="marked"/>
          <file name="SCORM.htm" mark="marked"/>
          <folder name="data">
          resources\<file name="docwrap.htm" mark="marked"/>
          resources\<file name="swfwrap.htm" mark="marked"/>
          </folder>
          </folder>
          <folder name="Team Management Part 2">
          <file name="SCORM.htm" mark="marked"/>
          </folder>
          </folder>
          <folder name="113605">
          <file name="1. Introduction to Organisational Behaviour.html" mark="marked"/>
          <file name="Accessing Newspaper Articles.html" mark="marked"/>
          </folder>
          

          A fourth time, run these two regex S/R B, successively which gives :

          <folder name="113603">
          <folder name="Topic 8-Part 1 Industrial Relations  Legislation">
          <folder name="data">
          <folder name="resources">
          <file name="docwrap.htm" mark="marked"/>
          <file name="swfwrap.htm" mark="marked"/>
          </folder>
          </folder>
          </folder>
          </folder>
          <folder name="113604">
          <file name="1st e-lecture was conducted via MS Teams.html" mark="marked"/>
          <file name="Additional Help to View Lecture Video.html" mark="marked"/>
          <file name="Cover Page for Submission of Assignment.html" mark="marked"/>
          <folder name="E-learning Task 3 Orientation">
          <folder name="lms">
          <file name="blank.html" mark="marked"/>
          <file name="goodbye.html" mark="marked"/>
          </folder>
          <folder name="presentation_content">
          <file name="blank.html" mark="marked"/>
          </folder>
          </folder>
          <folder name="E-learning Task 4  Industrial Relations (Part 2)">
          <file name="index.htm" mark="marked"/>
          <file name="SCORM.htm" mark="marked"/>
          <folder name="data">
          <folder name="resources">
          <file name="docwrap.htm" mark="marked"/>
          <file name="swfwrap.htm" mark="marked"/>
          </folder>
          </folder>
          </folder>
          <folder name="Team Management Part 2">
          <file name="SCORM.htm" mark="marked"/>
          </folder>
          </folder>
          <folder name="113605">
          <file name="1. Introduction to Organisational Behaviour.html" mark="marked"/>
          <file name="Accessing Newspaper Articles.html" mark="marked"/>
          </folder>
          

          Well, you could say : How I know when the process is finished ? Easy : it’s when all the lines of the OUTPUT begin with a < character ! This is indeed the case after four executions of the couple of regexes B, in this example ! In order to easily determine when the process is complete and just needs final leading indentations :

          • Select the Find tab

            • SEARCH (?-s)^(?!<).+ ( Regex C )

            • Tick the Wrap around option

            • Click on the Count button

            • IF you get the message Count: O matches in entire file the process is finished

            • ELSE you need to execute the two regex B some more times !

          In my next post, I will explain how to automate all this process with macros

          Best Regards

          guy038

          1 Reply Last reply Reply Quote 2
          • guy038G
            guy038
            last edited by guy038 Jun 12, 2022, 9:57 AM Jun 9, 2022, 1:45 PM

            Hi, @mohammad-hussain, @neil-schipper and All,

            Your active shortcuts.xml file should be located, either :

            • In the %appData%\Notepad++ folder, for a standard installation with the installer

            • In the Notepad++.exe folder, for a local installation

            Then :

            • Start Microsoft notepad.exe

            • Open the right Shortcuts.xml file

            • Insert, at the end of the <macros>...</macros> mode, the following macros Macro_A, Macro_B and Macro_C :

                    <Macro name="Macro_A" Ctrl="no" Alt="no" Shift="no" Key="0">
                        <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />
                        <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?-i)(?&lt;=\\)[^\\]+\.html?|(^P:\\|^\h*\R)" />
                        <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />
                        <Action type="3" message="1602" wParam="0" lParam="0" sParam='(?1:&lt;file name=&quot;$0&quot; mark=&quot;marked&quot;/&gt;)' />
                        <Action type="3" message="1702" wParam="0" lParam="768" sParam="" />
                        <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" />
                    </Macro>
                    <Macro name="Macro_B" Ctrl="no" Alt="no" Shift="no" Key="0">
                        <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />
                        <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?-is)^(.+?)\\.+\R(?:(?:(?!&lt;/).)+\R)*\1\\.+\R|^(.+?)\\&lt;file name.+\R" />
                        <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />
                        <Action type="3" message="1602" wParam="0" lParam="0" sParam='&lt;folder name=&quot;\1\2&quot;&gt;\r\n$0&lt;/folder&gt;\r\n' />
                        <Action type="3" message="1702" wParam="0" lParam="768" sParam="" />
                        <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" />
                        <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />
                        <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?-s)^(?!&lt;).+?\\(.+)" />
                        <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />
                        <Action type="3" message="1602" wParam="0" lParam="0" sParam="\1" />
                        <Action type="3" message="1702" wParam="0" lParam="768" sParam="" />
                        <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" />
                    </Macro>
                    <Macro name="Macro_C" Ctrl="no" Alt="no" Shift="no" Key="0">
                        <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />
                        <Action type="3" message="1601" wParam="0" lParam="0" sParam="^(?!&lt;).+" />
                        <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />
                        <Action type="3" message="1702" wParam="0" lParam="768" sParam="" />
                        <Action type="3" message="1701" wParam="0" lParam="1614" sParam="" />
                    </Macro>
            
            • Save the modifications

            • Now, stop and restart Notepad++

            • Open your INPUT file

            • First, run, once only, the Macro_A macro

            • Then, run several times the Macro_B macro

            • From time to time, run the Macro_C macro in order to verify if the process is terminated. Note that the Find dialog must be opened to get the results !


            Remark :

            You may assign a shortcut to each of these macros :

            • Use the Macro > Modify Shortcut/Delete Macro option

            • Double-click on each macro and enter your preferred shortcut

            Best Regards,

            guy038

            1 Reply Last reply Reply Quote 2
            1 out of 5
            • First post
              1/5
              Last post
            The Community of users of the Notepad++ text editor.
            Powered by NodeBB | Contributors