• Login
Community
  • Login

Using latest version 7.5 64bit. How to remove duplicte Lines

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
31 Posts 15 Posters 11.1k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • R
    rinku singh
    last edited by rinku singh Feb 21, 2019, 2:47 PM Feb 21, 2019, 2:46 PM

    there i have a plugin submitted for plugin admin 32bit
    Remove_dup_lines

    download link

    M 1 Reply Last reply Feb 21, 2019, 2:54 PM Reply Quote 0
    • M
      Meta Chuh moderator @rinku singh
      last edited by Meta Chuh Feb 21, 2019, 2:56 PM Feb 21, 2019, 2:54 PM

      @gurikbal-singh

      it says “all checks have failed” if you go to your link https://github.com/notepad-plus-plus/nppPluginList/pull/59

      what’s the difference to the built in Add “Remove Duplicate Lines” feature seen at following commit ?
      https://github.com/notepad-plus-plus/notepad-plus-plus/commit/51f10bdba56a415d42eb829b27a08955cb7db0dd

      R 1 Reply Last reply Feb 21, 2019, 2:59 PM Reply Quote 0
      • R
        rinku singh @Meta Chuh
        last edited by Feb 21, 2019, 2:59 PM

        @Meta-Chuh said:

        https://github.com/notepad-plus-plus/notepad-plus-plus/commit/51f10bdba56a415d42eb829b27a08955cb7db0dd

        thank you !

        1 Reply Last reply Reply Quote 0
        • G
          guy038
          last edited by guy038 Feb 21, 2019, 8:36 PM Feb 21, 2019, 7:47 PM

          Hello @scott-fredrick-smith, @peterjones and All,

          Peter, Instead of considering numbers starting lines, I just supposed that we need to delete all the lines containing duplicate consecutive N first characters !

          So, this leads to the following regex S/R, where number 7 corresponds to the number of first characters ( digits ) which must occur on further consecutive lines :

          SEARCH (?-s)^((.{7}).*\R)(?:\2.*\R?)+

          REPLACE \1

          => I got the same results as yours, even if the very last line does not end with a line-break ;-))


          Now, with our regex S/R, we may get some weird results, when applying against these sample texts, below :

          A) A slightly modified title is the first line of block 0040134

          0040134 DRAWING TITLE 1A - (SLIGHTLY MODIFIED TITLE)
          0040134 DRAWING TITLE 1
          0040134 DRAWING TITLE 1
          0040999 DRAWING TITLE 3
          0040999 SUPER MODIFIED TITLE
          0040999 THIRD TITLE FOR SAME NUMBER
          0040135 DRAWING TITLE 2
          0040135 DRAWING TITLE 2A - (SLIGHTLY MODIFIED TITLE)
          

          B) A very different title, after number, is the first line of block 0040999

          0040134 DRAWING TITLE 1
          0040134 DRAWING TITLE 1A - (SLIGHTLY MODIFIED TITLE)
          0040134 DRAWING TITLE 1
          0040999 SUPER MODIFIED TITLE
          0040999 DRAWING TITLE 3
          0040999 THIRD TITLE FOR SAME NUMBER
          0040135 DRAWING TITLE 2
          0040135 DRAWING TITLE 2A - (SLIGHTLY MODIFIED TITLE)
          

          Luckily, if we do a “pre” sort operation, cases A and B, after replacement, do give us the expected results :

          0040134 DRAWING TITLE 1
          0040135 DRAWING TITLE 2
          0040999 DRAWING TITLE 3
          

          However the case below, with or without a preliminary sort, will fail !

          C) A line, presently not the first of its block 0040999, would become the first one, after a sort

          0040134 DRAWING TITLE 1
          0040134 DRAWING TITLE 1A - (SLIGHTLY MODIFIED TITLE)
          0040134 DRAWING TITLE 1
          0040999 SUPER MODIFIED TITLE
          0040999 DRAWING TITLE 3
          0040999 3rd TITLE FOR SAME NUMBER
          0040135 DRAWING TITLE 2
          0040135 DRAWING TITLE 2A - (SLIGHTLY MODIFIED TITLE)
          

          After the sort and the S/R, we would get :

          0040134 DRAWING TITLE 1
          0040135 DRAWING TITLE 2
          0040999 3rd TITLE FOR SAME NUMBER
          

          So, as usual, the correct answer depends on OP’s needs. As Peter said, we just need additional data !

          Best Regards,

          guy038

          1 Reply Last reply Reply Quote 3
          • R
            Rvw Mmrs
            last edited by Feb 22, 2019, 2:32 AM

            hi guy038
            can i ask you in which field did you graduate?

            G 1 Reply Last reply Feb 22, 2019, 12:51 PM Reply Quote 0
            • G
              guy038 @Rvw Mmrs
              last edited by guy038 Feb 22, 2019, 12:51 PM Feb 22, 2019, 12:51 PM

              Hello, @rvw-mmrs,

              OMG, it has been a long time since I finished my graduate education and got a degree in radio-electricity, electronics and computer science !

              It is my turn to ask you why such a question and how it relates to the current discussion !?

              Best Regards,

              guy038

              A 1 Reply Last reply Feb 22, 2019, 1:02 PM Reply Quote 3
              • A
                Alan Kilborn @guy038
                last edited by Feb 22, 2019, 1:02 PM

                @guy038

                Maybe he wants to offer you a job…doing REGEX full time!! :)

                1 Reply Last reply Reply Quote 4
                • S
                  Scott Fredrick Smith @PeterJones
                  last edited by Feb 22, 2019, 6:58 PM

                  @PeterJones ,

                  I decided after your post yesterday to share what I have been working on. Thank you for your response and your detailed regex works quite nicely! I made a few tweeks and have shared them below.

                  I want to thank everyone who has contributed and responded to the questions I asked earlier in this forum, and I am sure this post will spur further discussion and collaboration.

                  For some time now, I have been trying to create an Indented Drawing Tree from the Multi-Level Bill of Material Export from PTC Windchill. PTC removed their drawing tree capability in their last major Windchill update, so it left me with no way to see a high level view of the product.

                  Drawings describe parts, so it is important from a program management perspective to be able to “see” the entire product structure in an indented view.

                  This is a work in progress, so I would like to accomplish all of the steps below with one hot key. I may try to do that with AutoHotKey, since it supports regular expressions and possibly run that script in Notepad++.

                  Any suggestions are welcome as to how to combine all of these steps into one Regex, or run it in one AutoHotKey script. The result of this will benefit many people who have the same or similar requirements.

                  I will do my best here to format this post with the correct Markdown. Please forgive me if it doesn’t meet all the Forum’s Markdown requirements, since I am new here.

                  ---------------------------------------------------------------------------

                  Objective: Produce an Indented Drawing Tree that is extracted from a Multi-Level Bill of Material.

                  Multi-Level BOM is extracted from PTC Windchill and exported to .XLSX format.
                  The output from Winchill PTC for a Multi-Level BOM looks like this.

                  Watered down example, nothing proprietary here. Does not represent a real product

                  0	AGH111900-1		GENERATOR
                  1	    VA111200G1	GENERATOR ASSEMBLY - TOP LEVEL
                  2			VA111200P1 GENERATOR ASSEMBLY - HOUSING
                  2	        100629-042	ADHESIVE, ANAEROBIC, LIQUID RESIN
                  2	        200-000-111-112	CONNECTOR
                  2	        200-000-112-004	CONNECTOR CONTACT
                  2	        A50GB0013-1	TAPE, PRESSURE SENSITIVE ADHESIVE- POLYIMIDE
                  2	        AS3236-06	BOLT, MACHINE - DOUBLE HEX HEAD
                  2	        MIL-PRF-7808	PERF.SPEC,LUBRICATING OIL,GR3
                  2	        MS16996-10	SCREW, SOCKET HEAD
                  2	        VA112719P1	GEAR RETAINER
                  2	        VA112719P2	GEAR RETAINER
                  2	        VA112799P2	GROMMET - T2
                  2	        VA112799P3	GROMMET
                  2	        VA112817G1	PLATE - IDENTIFICATION
                  3	            VA112817P1	PLATE - IDENTIFICATION
                  3	            3-011-001	INSULATING CMPD,ELE 
                  3	            G11257P6	PLATE-BLANK
                  3	            K34706P1	THINNER, PAINT PRODUCTS
                  4	                S-8	THINNER, PAINT PRODUCTS
                  3	            TD111234	PROCESS SPECIFICATION - SERIALIZATION
                  2	        VA113269P1	COVER - DISCONNECT ASSEMBLY
                  2	        VA113289P1	DISC RETAINER - GEAR
                  2	        VA113448P4	BUSHING - HEATER HOUSING
                  2	        VA113453P1	SHIM - 0.630 OD, 0.200 ID, 0.005 THK
                  2	        VA113453P2	SHIM - 0.630 OD, 0.200 ID, 0.005 Thick
                  

                  ---------------------------------------------------------------------------

                  I am only looking for the Top Level AGH111900-1 and any VA Drawings, so I filtered this down using the grouping and search capability in MS Excel. Removed the parts that are not represented by drawings. I pasted this into Notepad++ from Excel. So this is where I have been starting with Regex’s in Notepad++. Regex suggestions on removing everything other than VA drawings in Notepad++ vs. MS Excel

                  AGH111900-1	GENERATOR
                  	VA111200G1	GENERATOR ASSEMBLY - TOP LEVEL
                  		VA111200P1 GENERATOR ASSEMBLY - HOUSING
                  		VA112719P1	GEAR RETAINER
                  		VA112719P2	GEAR RETAINER
                  		VA112799P2	GROMMET - T2
                  		VA112799P3	GROMMET
                  		VA112817G1	PLATE - IDENTIFICATION
                  			VA112817P1	PLATE - IDENTIFICATION
                  		VA113269P1	COVER - DISCONNECT ASSEMBLY
                  		VA113289P1	DISC RETAINER - GEAR
                  		VA113448P4	BUSHING - HEATER HOUSING
                  		VA113453P1	SHIM - 0.630 OD, 0.200 ID, 0.005 THK
                  		VA113453P2	SHIM - 0.630 OD, 0.200 ID, 0.005 Thick
                  

                  ---------------------------------------------------------------------------

                  Starting with the first VA part number, I removed the "P’s & G’s and one to three numbers that follow the VA number with a Notepad++ recorded macro.
                  Regex suggestions on how to remove the P’s, G’s, and the numbers in between with a tab at the end welcome!

                  AGH111900-1	GENERATOR
                  	VA111200	GENERATOR ASSEMBLY - TOP LEVEL
                  		VA111200	GENERATOR ASSEMBLY - HOUSING
                  		VA112719	GEAR RETAINER
                  		VA112719	GEAR RETAINER
                  		VA112799	GROMMET - T2
                  		VA112799	GROMMET
                  		VA112817	PLATE - IDENTIFICATION
                  			VA112817	PLATE - IDENTIFICATION
                  		VA113269	COVER - DISCONNECT ASSEMBLY
                  		VA113289	DISC RETAINER - GEAR
                  		VA113448	BUSHING - HEATER HOUSING
                  		VA113453	SHIM - 0.630 OD, 0.200 ID, 0.005 THK
                  		VA113453	SHIM - 0.630 OD, 0.200 ID, 0.005 Thick
                  

                  ---------------------------------------------------------------------------

                  I then ran a replace with this regex per @guy038 help, and adding the capturing group around the repeated group to capture all iterations.

                  I used

                  • Find = ^(.*)(?:\r?\n\1)+$
                  • Replace = \1

                  Options: Case sensitive; Exact spacing; Dot matches line breaks; ^$ match at line breaks; Numbered capture; Allow zero-length matches

                  • Assert position at the beginning of a line (at beginning of the string or after a line break character) (carriage return and line feed, form feed) ^

                  • Match the regex below and capture its match into backreference number 1 (.*)

                    • Match any single character .*
                      • Between zero and unlimited times, as many times as possible, giving back as needed (greedy) *
                  • Match the regular expression below (?:\r?\n\1)+

                    • Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
                    • Match the carriage return character \r?
                      • Between zero and one times, as many times as possible, giving back as needed (greedy) ?
                    • Match the line feed character \n
                    • Match the same text that was most recently matched by capturing group number 1 (case sensitive; fail if the group did not participate in the match so far) \1
                  • Assert position at the end of a line (at the end of the string or before a line break character) (carriage return and line feed, form feed) $

                  • Insert the text that was last matched by capturing group number 1 \1

                  Result: Removed the duplicate VA112719

                  AGH111900-1	GENERATOR
                  	VA111200	GENERATOR ASSEMBLY - TOP LEVEL
                  		VA111200	GENERATOR ASSEMBLY - HOUSING
                  		VA112719	GEAR RETAINER
                  		VA112799	GROMMET - T2
                  		VA112799	GROMMET
                  		VA112817	PLATE - IDENTIFICATION
                  			VA112817	PLATE - IDENTIFICATION
                  		VA113269	COVER - DISCONNECT ASSEMBLY
                  		VA113289	DISC RETAINER - GEAR
                  		VA113448	BUSHING - HEATER HOUSING
                  		VA113453	SHIM - 0.630 OD, 0.200 ID, 0.005 THK
                  		VA113453	SHIM - 0.630 OD, 0.200 ID, 0.005 Thick
                  

                  ---------------------------------------------------------------------------

                  I then used @Terry-R recommendation to remove the duplicate indented lines with:

                  • Find = (?-s)^(.+\R)\h{4}\1+
                  • Replace = \1

                  Options: Case sensitive; Exact spacing; Dot matches line breaks; ^$ match at line breaks; Numbered capture; Allow zero-length matches

                  • Use these options for the whole regular expression (?-s)

                    • (hyphen inverts the meaning of the letters that follow) -
                    • Dot doesn’t match line breaks s
                  • Assert position at the beginning of a line (at beginning of the string or after a line break character) (carriage return and line feed, form feed) ^

                  • Match the regex below and capture its match into backreference number 1 (.+\R)

                    • Match any single character that is NOT a line break character (line feed, carriage return, form feed) .+
                      • Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
                    • Match a line break (carriage return and line feed pair, sole line feed, sole carriage return, vertical tab, form feed) \R
                  • Match a single character that is a “hortizonal whitespace character” (tab or any space in the active code page) \h{4}

                    • Exactly 4 times {4}
                  • Match the same text that was most recently matched by capturing group number 1 (case sensitive; fail if the group did not participate in the match so far) \1+

                    • Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
                  • Insert the text that was last matched by capturing group number 1 \1

                  With this result: Removed the indented duplicate VA112817

                  AGH111900-1	GENERATOR
                  	VA111200	GENERATOR ASSEMBLY - TOP LEVEL
                  		VA111200	GENERATOR ASSEMBLY - HOUSING
                  		VA112719	GEAR RETAINER
                  		VA112799	GROMMET - T2
                  		VA112799	GROMMET
                  		VA112817	PLATE - IDENTIFICATION
                  		VA113269	COVER - DISCONNECT ASSEMBLY
                  		VA113289	DISC RETAINER - GEAR
                  		VA113448	BUSHING - HEATER HOUSING
                  		VA113453	SHIM - 0.630 OD, 0.200 ID, 0.005 THK
                  		VA113453	SHIM - 0.630 OD, 0.200 ID, 0.005 Thick
                  

                  ---------------------------------------------------------------------------

                  I then am runnning @peterjones regex, adding capturing group around the repeated group to capture all iterations:

                  • Find = (?-s)^(\s+\w+\b)(.*\R)((?:\1.*(?:\Z|\R))+)
                  • Replace = \1\2

                  Options: Case sensitive; Exact spacing; Dot matches line breaks; ^$ match at line breaks; Numbered capture; Allow zero-length matches

                  • Use these options for the whole regular expression (?-s)

                    • (hyphen inverts the meaning of the letters that follow) -
                    • Dot doesn’t match line breaks s
                  • Assert position at the beginning of a line (at beginning of the string or after a line break character) (carriage return and line feed, form feed) ^

                  • Match the regex below and capture its match into backreference number 1 (\d+\b)

                    • Match a single character that is a “digit” (any symbol with a decimal value in the active code page) \d+
                      • Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
                    • Assert position at a word boundary (position preceded or followed—but not both—by a letter, digit, or underscore in the active code page) \b
                  • Match the regex below and capture its match into backreference number 2 (.*\R)

                    • Match any single character that is NOT a line break character (line feed, carriage return, form feed) .*
                      • Between zero and unlimited times, as many times as possible, giving back as needed (greedy) *
                    • Match a line break (carriage return and line feed pair, sole line feed, sole carriage return, vertical tab, form feed) \R
                  • Match the regular expression below (?:\1.*(?:\Z|\R))+

                    • Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
                    • Match the same text that was most recently matched by capturing group number 1 (case sensitive; fail if the group did not participate in the match so far) \1
                    • Match any single character that is NOT a line break character (line feed, carriage return, form feed) .*
                      • Between zero and unlimited times, as many times as possible, giving back as needed (greedy) *
                    • Match the regular expression below (?:\Z|\R)
                      • Match this alternative (attempting the next alternative only if this one fails) \Z
                        • Assert position at the end of the string, or before any number of line breaks at the end of the string (carriage return and line feed, form feed) \Z
                      • Or match this alternative (the entire group fails if this one fails to match) \R
                        • Match a line break (carriage return and line feed pair, sole line feed, sole carriage return, vertical tab, form feed) \R
                  • Insert the text that was last matched by capturing group number 1 \1

                  • Insert the text that was last matched by capturing group number 2 \2

                  Result: Removed the duplicate VA112799 with a different title

                  AGH111900-1	GENERATOR
                  	VA111200	GENERATOR ASSEMBLY - TOP LEVEL
                  		VA111200	GENERATOR ASSEMBLY - HOUSING
                  		VA112719	GEAR RETAINER
                  		VA112799	GROMMET - T2
                  		VA112817	PLATE - IDENTIFICATION
                  		VA113269	COVER - DISCONNECT ASSEMBLY
                  		VA113289	DISC RETAINER - GEAR
                  		VA113448	BUSHING - HEATER HOUSING
                  		VA113453	SHIM - 0.630 OD, 0.200 ID, 0.005 THK
                  

                  ---------------------------------------------------------------------------

                  And finally, I am running this modification to @peterjones regex to find the dissimilar indented drawing with different titles:

                  *Find = (?-s)^(\s+\w+\b)(.*\R)\h{4}((?:\1.*(?:\Z|\R))+)
                  *Replace = \1\2

                  Options: Case sensitive; Exact spacing; Dot matches line breaks; ^$ match at line breaks; Numbered capture; Allow zero-length matches

                  • Use these options for the whole regular expression (?-s)

                    • (hyphen inverts the meaning of the letters that follow) -
                    • Dot doesn’t match line breaks s
                  • Assert position at the beginning of a line (at beginning of the string or after a line break character) (carriage return and line feed, form feed) ^

                  • Match the regex below and capture its match into backreference number 1 (\s+\w+\b)

                    • Match a single character that is a “whitespace character” (any space in the active code page, tab, line feed, carriage return, vertical tab, form feed) \s+
                      • Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
                    • Match a single character that is a “word character” (letter, digit, or underscore in the active code page) \w+
                      • Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
                    • Assert position at a word boundary (position preceded or followed—but not both—by a letter, digit, or underscore in the active code page) \b
                  • Match the regex below and capture its match into backreference number 2 (.*\R)

                    • Match any single character that is NOT a line break character (line feed, carriage return, form feed) .*
                      • Between zero and unlimited times, as many times as possible, giving back as needed (greedy) *
                    • Match a line break (carriage return and line feed pair, sole line feed, sole carriage return, vertical tab, form feed) \R
                  • Match a single character that is a “hortizonal whitespace character” (tab or any space in the active code page) \h{4}

                    • Exactly 4 times {4}
                  • Match the regex below and capture its match into backreference number 3 ((?:\1.*(?:\Z|\R))+)

                    • Match the regular expression below (?:\1.*(?:\Z|\R))+
                      • Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
                      • Match the same text that was most recently matched by capturing group number 1 (case sensitive; fail if the group did not participate in the match so far) \1
                      • Match any single character that is NOT a line break character (line feed, carriage return, form feed) .*
                        • Between zero and unlimited times, as many times as possible, giving back as needed (greedy) *
                      • Match the regular expression below (?:\Z|\R)
                        • Match this alternative (attempting the next alternative only if this one fails) \Z
                          • Assert position at the end of the string, or before any number of line breaks at the end of the string (carriage return and line feed, form feed) \Z
                        • Or match this alternative (the entire group fails if this one fails to match) \R
                          • Match a line break (carriage return and line feed pair, sole line feed, sole carriage return, vertical tab, form feed) \R
                  • Insert the text that was last matched by capturing group number 1 \1

                  • Insert the text that was last matched by capturing group number 2 \2

                  Result: Removed the VA111200 indented duplicate with a different title

                  AGH111900-1	GENERATOR
                  	VA111200	GENERATOR ASSEMBLY - TOP LEVEL
                  		VA112719	GEAR RETAINER
                  		VA112799	GROMMET - T2
                  		VA112817	PLATE - IDENTIFICATION
                  		VA113269	COVER - DISCONNECT ASSEMBLY
                  		VA113289	DISC RETAINER - GEAR
                  		VA113448	BUSHING - HEATER HOUSING
                  		VA113453	SHIM - 0.630 OD, 0.200 ID, 0.005 THK
                  

                  Walla! An Indented Drawing Tree from a Multi-Level BOM export out of PTC Windchill!

                  1 Reply Last reply Reply Quote 3
                  • V
                    Vasile Caraus
                    last edited by Feb 22, 2019, 7:13 PM

                    you can try one of this regex:

                    SEARCH:
                    (?-s)^(.*)\R(?s)(?=.*^\1(?:\R|\z))
                    REPLACE BY:
                    (LEAVE EMPTY)

                    OR

                    SEARCH:
                    (?-s)^(.*)(?:\R)(?s)(?=.*^\1\R)
                    REPLACE BY:
                    (LEAVE EMPTY)

                    1 Reply Last reply Reply Quote 0
                    • G
                      guy038
                      last edited by Feb 22, 2019, 7:45 PM

                      Hi, @scott-fredrick-smith,

                      It’s about 20h30, in France, and tomorrow, I have to be awake, around 5h30, for a ski-day in Meribel ( The “3-vallées” domain ! ). So, please, just wait until Sunday to give me time to study your loooooong reply ;-))

                      Cheers,

                      guy038

                      S 1 Reply Last reply Feb 22, 2019, 8:34 PM Reply Quote 2
                      • S
                        Scott Fredrick Smith @guy038
                        last edited by Feb 22, 2019, 8:34 PM

                        @guy038 said:

                        Meribel

                        Looks like a fabulous place to ski. Enjoy!

                        1 Reply Last reply Reply Quote 0
                        • G
                          guy038
                          last edited by guy038 Feb 24, 2019, 11:56 AM Feb 23, 2019, 10:25 PM

                          Hello, @scott-fredrick-smith, and All,

                          From your original text, below :

                          0   AGH111900-1     GENERATOR
                          1       VA111200G1  GENERATOR ASSEMBLY - TOP LEVEL
                          2           VA111200P1 GENERATOR ASSEMBLY - HOUSING
                          2           100629-042  ADHESIVE, ANAEROBIC, LIQUID RESIN
                          2           200-000-111-112 CONNECTOR
                          2           200-000-112-004 CONNECTOR CONTACT
                          2           A50GB0013-1 TAPE, PRESSURE SENSITIVE ADHESIVE- POLYIMIDE
                          2           AS3236-06   BOLT, MACHINE - DOUBLE HEX HEAD
                          2           MIL-PRF-7808    PERF.SPEC,LUBRICATING OIL,GR3
                          2           MS16996-10  SCREW, SOCKET HEAD
                          2           VA112719P1  GEAR RETAINER
                          2           VA112719P2  GEAR RETAINER
                          2           VA112799P2  GROMMET - T2
                          2           VA112799P3  GROMMET
                          2           VA112817G1  PLATE - IDENTIFICATION
                          3               VA112817P1  PLATE - IDENTIFICATION
                          3               3-011-001   INSULATING CMPD,ELE 
                          3               G11257P6    PLATE-BLANK
                          3               K34706P1    THINNER, PAINT PRODUCTS
                          4                   S-8 THINNER, PAINT PRODUCTS
                          3               TD111234    PROCESS SPECIFICATION - SERIALIZATION
                          2           VA113269P1  COVER - DISCONNECT ASSEMBLY
                          2           VA113289P1  DISC RETAINER - GEAR
                          2           VA113448P4  BUSHING - HEATER HOUSING
                          2           VA113453P1  SHIM - 0.630 OD, 0.200 ID, 0.005 THK
                          2           VA113453P2  SHIM - 0.630 OD, 0.200 ID, 0.005 Thick
                          

                          Here is a regex S/R which :

                          • Delete any line which does not contain VA Drawings and is different from the level 0 line

                          • Delete all the P’s and G’s, followed with digits

                          SEARCH (?-is)(?!.*VA\d+|^0|^\u)^.+\R|^\d\x20{3}|[GP]\d+

                          REPLACE Leave EMPTY

                          So, after clicking, once, on the Replace All button or several times on the Replace button, you should get :

                          AGH111900-1     GENERATOR
                              VA111200  GENERATOR ASSEMBLY - TOP LEVEL
                                  VA111200 GENERATOR ASSEMBLY - HOUSING
                                  VA112719  GEAR RETAINER
                                  VA112719  GEAR RETAINER
                                  VA112799  GROMMET - T2
                                  VA112799  GROMMET
                                  VA112817  PLATE - IDENTIFICATION
                                      VA112817  PLATE - IDENTIFICATION
                                  VA113269  COVER - DISCONNECT ASSEMBLY
                                  VA113289  DISC RETAINER - GEAR
                                  VA113448  BUSHING - HEATER HOUSING
                                  VA113453  SHIM - 0.630 OD, 0.200 ID, 0.005 THK
                                  VA113453  SHIM - 0.630 OD, 0.200 ID, 0.005 Thick
                          

                          Nice, isn’t it ?


                          Now, I built an other regex, which keeps the level number, at beginning of lines and align all the VA Drawings lines

                          This could be important regarding further deleting of [pseudo] duplicate lines !

                          SEARCH (?-is)(?!.*VA\d+|^0)^.+\R|^\d+\x20\K\x20+|VA\d+\K[GP]\d+

                          REPLACE Leave EMPTY

                          This time, due to the \K syntax, in some locations of the regex, use the Replace All button, exclusively !

                          So, from your initial text, you should obtain, this time, the text below :

                          0 AGH111900-1     GENERATOR
                          1 VA111200  GENERATOR ASSEMBLY - TOP LEVEL
                          2 VA111200 GENERATOR ASSEMBLY - HOUSING
                          2 VA112719  GEAR RETAINER
                          2 VA112719  GEAR RETAINER
                          2 VA112799  GROMMET - T2
                          2 VA112799  GROMMET
                          2 VA112817  PLATE - IDENTIFICATION
                          3 VA112817  PLATE - IDENTIFICATION
                          2 VA113269  COVER - DISCONNECT ASSEMBLY
                          2 VA113289  DISC RETAINER - GEAR
                          2 VA113448  BUSHING - HEATER HOUSING
                          2 VA113453  SHIM - 0.630 OD, 0.200 ID, 0.005 THK
                          2 VA113453  SHIM - 0.630 OD, 0.200 ID, 0.005 Thick
                          

                          Not bad, too !

                          BTW, don’t worry about suppression of the indenting. We’ll be able to get the indenting again, at the end of the process !


                          Now, at this point of our discussion, the best would be that you tell me which lines, among the 14 lines just above, you would like to keep ;-)) With this additional information, I’ll try to find out a regex matching your needs and deleting all the other [duplicate] lines ;-))

                          See you later,

                          Best Regards

                          guy038

                          S 1 Reply Last reply Feb 25, 2019, 11:18 PM Reply Quote 2
                          • S
                            Scott Fredrick Smith @guy038
                            last edited by Feb 25, 2019, 11:18 PM

                            @guy038 ,

                            Yes, Very Nice! Both regex’s work great!

                            Starting with:

                            AGH111900-1     GENERATOR
                                VA111200G1  GENERATOR ASSEMBLY - TOP LEVEL
                                    VA111200P1 GENERATOR ASSEMBLY - HOUSING
                                    100629-042  ADHESIVE, ANAEROBIC, LIQUID RESIN
                                    200-000-111-112 CONNECTOR
                                    200-000-112-004 CONNECTOR CONTACT
                                    A50GB0013-1 TAPE, PRESSURE SENSITIVE ADHESIVE- POLYIMIDE
                                    AS3236-06   BOLT, MACHINE - DOUBLE HEX HEAD
                                    MIL-PRF-7808    PERF.SPEC,LUBRICATING OIL,GR3
                                    MS16996-10  SCREW, SOCKET HEAD
                                    VA112719P1  GEAR RETAINER
                                    VA112719P2  GEAR RETAINER
                                    VA112799P2  GROMMET - T2
                                    VA112799P3  GROMMET
                                    VA112817G1  PLATE - IDENTIFICATION
                                        VA112817P1  PLATE - IDENTIFICATION
                                        3-011-001   INSULATING CMPD,ELE 
                                        G11257P6    PLATE-BLANK
                                        K34706P1    THINNER, PAINT PRODUCTS
                                            S-8 THINNER, PAINT PRODUCTS
                                        TD111234    PROCESS SPECIFICATION - SERIALIZATION
                                    VA113269P1  COVER - DISCONNECT ASSEMBLY
                                    VA113289P1  DISC RETAINER - GEAR
                                    VA113448P4  BUSHING - HEATER HOUSING
                                    VA113453P1  SHIM - 0.630 OD, 0.200 ID, 0.005 THK
                                    VA113453P2  SHIM - 0.630 OD, 0.200 ID, 0.005 Thick
                            

                            And to get to the results I want to achieve, I have summarized running the regex’s in the the order below:

                            1. In WindChill, Select the End Item to be exported. Go to the Structure Tab, Select Viewing, Select Display, Select Expand All Levels.
                              Go to Reports, Select Multi-Level BOM. Export Multi-Level BOM from Windchill. Select Actions, Export List to File, Export XLSX.

                            2. Copy data from Excel (only data below the Number and Name Column headers) and paste it into Notepad++.

                            3. In NotePad++ (Cntrl-H) use the Search and Replace.

                            4. Find what: (?-s)(?!.*VA1\d+|^0|^\u)^.+\R|^\d\x20{3}|[GP]\d+
                              Replace with: Leave Empty

                              Result: Removes anything that is not a “VA1” drawing, and removes the G (Assembly) & P (Part) conditions. Note: If you use “dash” numbers for parts and/or assemblies, you would look for a “-” instead.

                            5. Find what: ^(.*)(\r?\n\1)+$
                              Replace with:: \1
                              *In the Replace Dialog, Regular expression, X . matches newline (checked)

                              Result: Finds not just duplicates, but also finds groups of text that are duplicated, and removes the second duplicate group.
                              Keep searching with this one until it doesn’t find any more duplicates.

                            6. Find what: (?-s)^(.+\R)\h{4}\1+
                              Replace with: \1
                              *In the Replace Dialog, Regular expression, . matches newline (checked)

                              Result: Removes the indented duplicates

                            7. Find what: (?-s)^(\s+\w+\b)(.*\R)((?:\1.*(?:\Z|\R))+)
                              Replace with: \1\2
                              *In the Replace Dialog, Regular expression, . matches newline (checked)

                              Result: Removes duplicates that have the same drawing number, but dissimilar titles.

                            8. Find what: (?-s)^(\s+\w+\b)(.*\R)\h{4}((?:\1.*(?:\Z|\R))+)
                              Replace with: \1\2
                              *In the Replace Dialog, Regular expression, . matches newline (checked)

                              Result: Removes duplicates that have same drawing number, dissimilar titles and the duplicate is indented 4 spaces.

                              AGH111900-1 GENERATOR
                              VA111200 GENERATOR ASSEMBLY - TOP LEVEL
                              VA112719 GEAR RETAINER
                              VA112799 GROMMET - T2
                              VA112817 PLATE - IDENTIFICATION
                              VA113269 COVER - DISCONNECT ASSEMBLY
                              VA113289 DISC RETAINER - GEAR
                              VA113448 BUSHING - HEATER HOUSING
                              VA113453 SHIM - 0.630 OD, 0.200 ID, 0.005 THK

                            1 Reply Last reply Reply Quote 2
                            • G
                              guy038
                              last edited by guy038 Mar 1, 2019, 6:43 PM Feb 26, 2019, 8:37 PM

                              Hi, @scott-fredrick-smith, and All,

                              OK ! Taking in account, again, my first reges S/R, of my previous post :

                              So, from your original text, below :

                              0   AGH111900-1     GENERATOR
                              1       VA111200G1  GENERATOR ASSEMBLY - TOP LEVEL
                              2           VA111200P1 GENERATOR ASSEMBLY - HOUSING
                              2           100629-042  ADHESIVE, ANAEROBIC, LIQUID RESIN
                              2           200-000-111-112 CONNECTOR
                              2           200-000-112-004 CONNECTOR CONTACT
                              2           A50GB0013-1 TAPE, PRESSURE SENSITIVE ADHESIVE- POLYIMIDE
                              2           AS3236-06   BOLT, MACHINE - DOUBLE HEX HEAD
                              2           MIL-PRF-7808    PERF.SPEC,LUBRICATING OIL,GR3
                              2           MS16996-10  SCREW, SOCKET HEAD
                              2           VA112719P1  GEAR RETAINER
                              2           VA112719P2  GEAR RETAINER
                              2           VA112799P2  GROMMET - T2
                              2           VA112799P3  GROMMET
                              2           VA112817G1  PLATE - IDENTIFICATION
                              3               VA112817P1  PLATE - IDENTIFICATION
                              3               3-011-001   INSULATING CMPD,ELE
                              3               G11257P6    PLATE-BLANK
                              3               K34706P1    THINNER, PAINT PRODUCTS
                              4                   S-8 THINNER, PAINT PRODUCTS
                              3               TD111234    PROCESS SPECIFICATION - SERIALIZATION
                              2           VA113269P1  COVER - DISCONNECT ASSEMBLY
                              2           VA113289P1  DISC RETAINER - GEAR
                              2           VA113448P4  BUSHING - HEATER HOUSING
                              2           VA113453P1  SHIM - 0.630 OD, 0.200 ID, 0.005 THK
                              2           VA113453P2  SHIM - 0.630 OD, 0.200 ID, 0.005 Thick
                              

                              The following regex S/R, named A, which :

                              • Deletes any line which does not contain VA Drawings and is different from the level 0 line

                              • Deletes all the P’s and G’s, followed with digits

                              SEARCH (?-is)(?!.*VA\d+|^0|^\u)^.+\R|^\d\x20{3}|[GP]\d+

                              REPLACE Leave EMPTY

                              So, after clicking, once, on the Replace All button or several times on the Replace button, you should get :

                              AGH111900-1     GENERATOR
                                  VA111200  GENERATOR ASSEMBLY - TOP LEVEL
                                      VA111200 GENERATOR ASSEMBLY - HOUSING
                                      VA112719  GEAR RETAINER
                                      VA112719  GEAR RETAINER
                                      VA112799  GROMMET - T2
                                      VA112799  GROMMET
                                      VA112817  PLATE - IDENTIFICATION
                                          VA112817  PLATE - IDENTIFICATION
                                      VA113269  COVER - DISCONNECT ASSEMBLY
                                      VA113289  DISC RETAINER - GEAR
                                      VA113448  BUSHING - HEATER HOUSING
                                      VA113453  SHIM - 0.630 OD, 0.200 ID, 0.005 THK
                                      VA113453  SHIM - 0.630 OD, 0.200 ID, 0.005 Thick
                              

                              Now, with this new second regex S/R, named B, below, you just wipe out all duplicates lines !

                              SEARCH (?-s)^(\h+(VA\d+)\x20.+\R)(\h+\2.+\R)+

                              REPLACE \1

                              After clicking, once, on the Replace All button or several times on the Replace button, here is what you get. Practically, your final text !

                              AGH111900-1     GENERATOR
                                  VA111200  GENERATOR ASSEMBLY - TOP LEVEL
                                      VA112719  GEAR RETAINER
                                      VA112799  GROMMET - T2
                                      VA112817  PLATE - IDENTIFICATION
                                      VA113269  COVER - DISCONNECT ASSEMBLY
                                      VA113289  DISC RETAINER - GEAR
                                      VA113448  BUSHING - HEATER HOUSING
                                      VA113453  SHIM - 0.630 OD, 0.200 ID, 0.005 THK
                              

                              Finally, we just have to normalize the indenting spaces, after the VA drawings ( VA\d+ ) to 4 space characters. Thus, this last regex S/R, named C

                              SEARCH (VA\d+)\x20+

                              REPLACE \1\x20\x20\x20\x20

                              Again, after clicking, once, on the Replace All button or several times on the Replace button, you’ll obtain your expected text :

                              AGH111900-1     GENERATOR
                                  VA111200    GENERATOR ASSEMBLY - TOP LEVEL
                                      VA112719    GEAR RETAINER
                                      VA112799    GROMMET - T2
                                      VA112817    PLATE - IDENTIFICATION
                                      VA113269    COVER - DISCONNECT ASSEMBLY
                                      VA113289    DISC RETAINER - GEAR
                                      VA113448    BUSHING - HEATER HOUSING
                                      VA113453    SHIM - 0.630 OD, 0.200 ID, 0.005 THK
                              

                              Et voilà :-))

                              Best Regards,

                              guy038

                              P. S. : Next time, I could give you some explanations on these 3 regex S/R ( A, B and C ) !

                              1 Reply Last reply Reply Quote 2
                              27 out of 31
                              • First post
                                27/31
                                Last post
                              The Community of users of the Notepad++ text editor.
                              Powered by NodeBB | Contributors