Select Header lines only (or select non-header lines and delete them)



  • Hi, I am trying to isolate just header rows in text files. They follow a pattern below:

    Preferred Vendor,Item,SKU,QTY #310 0082252748 10-07-2016 b-101016 {294},PULLED,5 PLTS,Comments/Status,Andy,Alex
    bmericbM,IM TbROCCO CHIbNTI CMbSSICO,48808,1,1,2,
    bmericbM,Mb Fincb Cbbernet Sbugvignon,88076,1,1,2,
    byericbM,Vinbs ChiMenbs Sbuvignon BMbnc,88428,1,1,2,
    byericbM,ZONIN DOC PROSECCO,84842,1,1,2,
    bncient Pebks,MIBERTE Cbh PbSO ROBMES,88700,1,1,2,
    bSV,RT BMOCK 67 R CbB SbUV 8M BOX,74187,2,2,7,
    Grbnd TotbM,44,48,
    Preferred Vendor,Itey,SKU,QTY #401 0082270848 10-07-2016 b-101016 {60},PULLED,2 PLTS,Adrian,Puller1,Puller2
    byericbM,bybncby ybMbec Uco V brgentinb,77447,1,1,1,
    byericbM,VIMMb BORGHETTI MUNb PINOT GRIG,48787,1,1,1,
    byericbM,ViMMb Cerrinb Chbrdonnby Pinot,48882,1,1,1,
    byericbM,VIMMb CERRINb yONTEPUMCIbNO,48888,1,1,1,
    bncient Pebks,Miberte Pinot Noir SMO,84048,1,1,1,
    Grbnd TotbM,808,800,
    Preferred Vendor,Itey,SKU,QTY #408 0082270806 10-07-2016 b-101116 {12},PULLED,4 PLTS,Mark,Adrian,Puller2
    byericbM,Mb FINCb ybMBEC bRGENTINb,88078,2,2,1,
    byericbM,y CHEVbMMIER BRUT CbVb,40646,1,1,4,
    byericbM,RT’s Vinbs ChiMenbs Rosbrio,72888,1,1,1,
    byericbM,VIMMb BORGHETTI MUNb PINOT GRIG,48787,1,1,1,
    byericbM,ViMMb Cerrinb Chbrdonnby Pinot,48882,1,1,1,
    Grbnd TotbM,186,176,
    Preferred Vendor,Itey,SKU,QTY #908 0082270872 10-07-2016 b-101016 {7},PULLED,9 PLTS,Mark,Adrian,Puller2
    zz8bMTb ybRKETING,RIO BRbVO MIGHT,71216,1,1,1,
    zz8
    Mike Biersch,BB BOHEyIbN MbGER,78176,1,1,1,
    zz8Mike Biersch,BB Ocktoberfest,17816,1,0,0,
    zz8
    yutubM WhoMesbMe,Orbngebooy Preyiuy Mbger,97689,1,1,1,
    zz8*yutubM WhoMesbMe,Peter’s Brbnd HoMMbnd Beer,18802,1,1,1,
    Grbnd TotbM,7,9,
    Preferred Vendor,Itey,SKU,QTY #909 82270887 10-07-2016 b-101116 {12},PULLED,8 PLTS,Mark,Andy,Puller2
    byericbM,Mb FINCb ybMBEC bRGENTINb,88078,1,1,2,
    bSV,RT BMOCK 67 R CbB SbUV 8M BOX,79187,1,1,2,
    bSV,RT BMOCK 67 WHITE B SbUVIGNO,76728,1,1,2,
    bSV,RT’s Orgbnic Chbrdonnby,88821,1,1,2,
    bSV,RT’S Orgbnic yerMot,86827,1,1,2,
    bSV,RT’S Orgbnic RSV SbUVIGNON BMb,89120,1,1,2,
    Grbnd TotbM,170,168,
    Preferred Vendor,Itey,SKU,QTY #907 0082270890 10-07-2016 b-101016 {178},PULLED,8 PLTS,Mark,JC,Puller2
    byericbM,Mb Fincb Cbh Sbugvignon,88076,1,1,1,
    byericbM,Mb FINCb ybMBEC bRGENTINb,88078,1,1,1,
    byericbM,y CHEVbMMIER BRUT CbVb,90696,1,1,1,
    byericbM,VIMMb BORGHETTI MUNb PINOT GRIG,98787,1,1,1,
    byericbM,ViMMb Cerrinb Chbrdonnby Pinot,98882,1,1,1,
    Grbnd TotbM,178,

    When done it will look like this:

    Preferred Vendor,Item,SKU,QTY #310 0082252748 10-07-2016 b-101016 {294},PULLED,5 PLTS,Comments/Status,Andy,Alex
    Preferred Vendor,Itey,SKU,QTY #401 0082270848 10-07-2016 b-101016 {60},PULLED,2 PLTS,Adrian,Puller1,Puller2
    Preferred Vendor,Itey,SKU,QTY #408 0082270806 10-07-2016 b-101116 {12},PULLED,4 PLTS,Mark,Adrian,Puller2
    Preferred Vendor,Itey,SKU,QTY #908 0082270872 10-07-2016 b-101016 {7},PULLED,9 PLTS,Mark,Adrian,Puller2
    Preferred Vendor,Itey,SKU,QTY #909 82270887 10-07-2016 b-101116 {12},PULLED,8 PLTS,Mark,Andy,Puller2
    Preferred Vendor,Itey,SKU,QTY #907 0082270890 10-07-2016 b-101016 {178},PULLED,8 PLTS,Mark,JC,Puller2



  • Hi Felix,

    Very easy , indeed :-)

    • Go back to the very beginning of your file

    • Open the Replace dialog ( CTRL + H )

    • Type, in the Find what : zone (?-s)^(?!Preferred Vendor).+\R? with a space before the word Vendor

    • Leave the Replace with : zone EMPTY

    • Click on the Replace All button

    Et voilà !

    NOTES :

    • First, the in-line modifier (?-s) ensures that the dot ( . ) will match standard characters, only

    • The symbol ^ is a zero-length assertion which represents the location between the EOL character of the previous line and the first character of the current line

    • The form (?!Preferred Vendor) is a negative look-ahead, which verifies that a string “Preferred Vendor” does NOT occur at the beginning of the line

    • If this assertion is true, then the regex engine matches the regex .+\R?, that is to say the overall line with any optional EOL character ( \R? ). I used the ? quantifier ( meaning 0 OR 1 time ), just in case a non-wanted line, at the end of your file would not be followed by a line-break

    • So this regex matches complete non-header lines of your file and delete them, as the replacement reges is an empty string !

    Best Regards,

    guy038



  • Totally brilliant! I had a feeling it would be easy for you wizards. I had come up with regex that would select only the header rows, but could not get it to work in Notepad++. Your solution is super elegant. Thanks also for your explanation.



  • I have a similar issue only mine involves a series of log files I have concatenated together.
    Now I need to remove the header which is always 7lines and the only thing that changes is the date/time of the log.
    I tried to adapt the sample above to remove the lines one at a time but it hasn’t work yet

    A sample is below. Thanks for any help you can provide.

    **Version 2.1.1
    ComputerName = ABC
    ToolID = ABC
    ChamberType = XYZ

    Dynamic Alignment Data for: PM3 Date: Aug-28-2016 04:11:15:812
    Date; Time; Source Module; End-Effector; Y Offset; X Offset; WaferID; LotID; WaferFlow; PortName; RecipeName;**
    8-28-2016; 15071; PM3; B; -1.052; 1.023; 0043771000000000-08; 0828035206356D633FRY0HPM13_PRAET_70A_0; PM13_PRAET_70A; Port2; ;
    8-28-2016; 36358; PM3; B; -0.963; 0.736; 0061083200000000-22; 0828095300540D618TFR0PM3_PRAET_30A_0; PM3_PRAET_30A; Port2; ;
    8-28-2016; 36691; PM3; B; -1.033; 0.641; 0061083200000000-23; 0828095300540D618TFR0PM3_PRAET_30A_0; PM3_PRAET_30A; Port2; ;
    **Version 2.1.1
    ComputerName = ABC
    ToolID = ABC
    ChamberType = XYZ

    Dynamic Alignment Data for: PM3 Date: Aug-01-2016 00:12:10:375
    Date; Time; Source Module; End-Effector; Y Offset; X Offset; WaferID; LotID; WaferFlow; PortName; RecipeName;**
    8-1-2016; 726; PM3; B; -2.907; 2.357; 0041508000000000-23; 0731163717116D626FFC0CPM13_PRAET_200A_0; PM13_PRAET_200A; Port1; ;
    8-1-2016; 3813; PM3; B; -2.694; 2.502; 0040276900000000-01; 0801002542981D623FAL0PM13_PRAET_200A_0; PM13_PRAET_200A; Port2; ;
    8-1-2016; 5592; PM3; B; -2.833; 2.416; 0040276900000000-06; 0801002542981D623FAL0PM13_PRAET_200A_0; PM13_PRAET_200A; Port2; ;



  • Hello G40,

    If I fully understood what you would like to concerns the lines, beginning by the string Dynamic alignment Data. Am I right ?

    However, I could not decide whether you prefer :

    • (A) To delete the lines Dynamic alignment Data… and keep all the other lines

    • (B) To keep the lines Dynamic alignment Data… and delete all the other lines

    Anyway, I’m going to give the regexes for the both cases :-)) So :

    • For case (A), no difficulty : just search for the regex (?-s)^Dynamic Alignment Data.+\R

    • For case (B), just run the regex (?-s)^(?!Dynamic Alignment Data).*\R?

    in both cases, the replacement part is, always, left EMPTY


    For the notes on the regexes , just refer to my previous post. In addition :

    • In the (B) regex, I used .* ( instead of .+ ) which allows the regex to delete all the true empty lines

    • As usual, the \R is a shortened syntax to match any line-break ( \r\n, in Windows files, \n, in Unix files and \r, in old Mac files )

    Best Regards,

    guy038



  • Thanks Guy038.

    That will work but not quite what I had in mind. I’ll have to to it in a few iterations but that is ok. I am trying to delete the entire header as shown below and keep the remainder. That being said I can easily use what you gave me to do what I need.

    Thanks for the help.

    Version 2.1.1
    ComputerName = ABC
    ToolID = ABC
    ChamberType = XYZ

    Dynamic Alignment Data for: PM3 Date: Aug-01-2016 00:12:10:375
    Date; Time; Source Module; End-Effector; Y Offset; X Offset; WaferID; LotID; WaferFlow; PortName; RecipeName;



  • Hi G40,

    OK, I see, now what you would like to :-))

    • Delete any part, as below :

      **Version 2.1.1
      ComputerName = ABC
      ToolID = ABC
      ChamberType = XYZ

      Dynamic Alignment Data for: PM3 Date: Aug-28-2016 04:11:15:812
      Date; Time; Source Module; End-Effector; Y Offset; X Offset; WaferID; LotID; WaferFlow; PortName; RecipeName;**

    • Keep any part, as below :

      8-28-2016; 15071; PM3; B; -1.052; 1.023; 0043771000000000-08; 0828035206356D633FRY0HPM13_PRAET_70A_0; PM13_PRAET_70A; Port2; ;
      8-28-2016; 36358; PM3; B; -0.963; 0.736; 0061083200000000-22; 0828095300540D618TFR0PM3_PRAET_30A_0; PM3_PRAET_30A; Port2; ;
      8-28-2016; 36691; PM3; B; -1.033; 0.641; 0061083200000000-23; 0828095300540D618TFR0PM3_PRAET_30A_0; PM3_PRAET_30A; Port2; ;


    I just noticed that :

    • All the lines that you want to keep begin with a digit

    • All the lines that you want to delete, included true blank lines, do NOT begin with a digit

    This fact simplifies, drastically, our regex : we just have to delete all the lines which do not begin with a digit ! This ends to the simple S/R, below :

    SEARCH ^\D.+\R|^\R

    REPLACE Empty


    Notes :

    • From beginning of line ( ^ ), we’re searching for a non-digit character ( \D ) ( The opposite of \d ), followed by any non-null range of characters ( .+ ) , followed, itself by EOL character(s) ( \R )

    OR

    • From beginning of line ( ^ ), we’re searching for EOL character(s) ( \R ) ( So, a true blank line ! )

    • As the replacement zone is empty, all the complete lines are, then, deleted !

    Remark : The search regex may, also, be written ^(\D.+)?\R , without any alternative

    Cheers,

    guy038



  • Thats awesome! Thanks This saves me SO much time…


Log in to reply