Select Header lines only (or select non-header lines and delete them)
-
Hi, I am trying to isolate just header rows in text files. They follow a pattern below:
Preferred Vendor,Item,SKU,QTY #310 0082252748 10-07-2016 b-101016 {294},PULLED,5 PLTS,Comments/Status,Andy,Alex
bmericbM,IM TbROCCO CHIbNTI CMbSSICO,48808,1,1,2,
bmericbM,Mb Fincb Cbbernet Sbugvignon,88076,1,1,2,
byericbM,Vinbs ChiMenbs Sbuvignon BMbnc,88428,1,1,2,
byericbM,ZONIN DOC PROSECCO,84842,1,1,2,
bncient Pebks,MIBERTE Cbh PbSO ROBMES,88700,1,1,2,
bSV,RT BMOCK 67 R CbB SbUV 8M BOX,74187,2,2,7,
Grbnd TotbM,44,48,
Preferred Vendor,Itey,SKU,QTY #401 0082270848 10-07-2016 b-101016 {60},PULLED,2 PLTS,Adrian,Puller1,Puller2
byericbM,bybncby ybMbec Uco V brgentinb,77447,1,1,1,
byericbM,VIMMb BORGHETTI MUNb PINOT GRIG,48787,1,1,1,
byericbM,ViMMb Cerrinb Chbrdonnby Pinot,48882,1,1,1,
byericbM,VIMMb CERRINb yONTEPUMCIbNO,48888,1,1,1,
bncient Pebks,Miberte Pinot Noir SMO,84048,1,1,1,
Grbnd TotbM,808,800,
Preferred Vendor,Itey,SKU,QTY #408 0082270806 10-07-2016 b-101116 {12},PULLED,4 PLTS,Mark,Adrian,Puller2
byericbM,Mb FINCb ybMBEC bRGENTINb,88078,2,2,1,
byericbM,y CHEVbMMIER BRUT CbVb,40646,1,1,4,
byericbM,RT’s Vinbs ChiMenbs Rosbrio,72888,1,1,1,
byericbM,VIMMb BORGHETTI MUNb PINOT GRIG,48787,1,1,1,
byericbM,ViMMb Cerrinb Chbrdonnby Pinot,48882,1,1,1,
Grbnd TotbM,186,176,
Preferred Vendor,Itey,SKU,QTY #908 0082270872 10-07-2016 b-101016 {7},PULLED,9 PLTS,Mark,Adrian,Puller2
zz8bMTb ybRKETING,RIO BRbVO MIGHT,71216,1,1,1,
zz8Mike Biersch,BB BOHEyIbN MbGER,78176,1,1,1,
zz8Mike Biersch,BB Ocktoberfest,17816,1,0,0,
zz8yutubM WhoMesbMe,Orbngebooy Preyiuy Mbger,97689,1,1,1,
zz8*yutubM WhoMesbMe,Peter’s Brbnd HoMMbnd Beer,18802,1,1,1,
Grbnd TotbM,7,9,
Preferred Vendor,Itey,SKU,QTY #909 82270887 10-07-2016 b-101116 {12},PULLED,8 PLTS,Mark,Andy,Puller2
byericbM,Mb FINCb ybMBEC bRGENTINb,88078,1,1,2,
bSV,RT BMOCK 67 R CbB SbUV 8M BOX,79187,1,1,2,
bSV,RT BMOCK 67 WHITE B SbUVIGNO,76728,1,1,2,
bSV,RT’s Orgbnic Chbrdonnby,88821,1,1,2,
bSV,RT’S Orgbnic yerMot,86827,1,1,2,
bSV,RT’S Orgbnic RSV SbUVIGNON BMb,89120,1,1,2,
Grbnd TotbM,170,168,
Preferred Vendor,Itey,SKU,QTY #907 0082270890 10-07-2016 b-101016 {178},PULLED,8 PLTS,Mark,JC,Puller2
byericbM,Mb Fincb Cbh Sbugvignon,88076,1,1,1,
byericbM,Mb FINCb ybMBEC bRGENTINb,88078,1,1,1,
byericbM,y CHEVbMMIER BRUT CbVb,90696,1,1,1,
byericbM,VIMMb BORGHETTI MUNb PINOT GRIG,98787,1,1,1,
byericbM,ViMMb Cerrinb Chbrdonnby Pinot,98882,1,1,1,
Grbnd TotbM,178,When done it will look like this:
Preferred Vendor,Item,SKU,QTY #310 0082252748 10-07-2016 b-101016 {294},PULLED,5 PLTS,Comments/Status,Andy,Alex
Preferred Vendor,Itey,SKU,QTY #401 0082270848 10-07-2016 b-101016 {60},PULLED,2 PLTS,Adrian,Puller1,Puller2
Preferred Vendor,Itey,SKU,QTY #408 0082270806 10-07-2016 b-101116 {12},PULLED,4 PLTS,Mark,Adrian,Puller2
Preferred Vendor,Itey,SKU,QTY #908 0082270872 10-07-2016 b-101016 {7},PULLED,9 PLTS,Mark,Adrian,Puller2
Preferred Vendor,Itey,SKU,QTY #909 82270887 10-07-2016 b-101116 {12},PULLED,8 PLTS,Mark,Andy,Puller2
Preferred Vendor,Itey,SKU,QTY #907 0082270890 10-07-2016 b-101016 {178},PULLED,8 PLTS,Mark,JC,Puller2 -
Hi Felix,
Very easy , indeed :-)
-
Go back to the very beginning of your file
-
Open the Replace dialog ( CTRL + H )
-
Type, in the Find what : zone
(?-s)^(?!Preferred Vendor).+\R?
with a space before the word Vendor -
Leave the Replace with : zone
EMPTY
-
Click on the Replace All button
Et voilà !
NOTES :
-
First, the in-line modifier
(?-s)
ensures that the dot (.
) will match standard characters, only -
The symbol
^
is a zero-length assertion which represents the location between the EOL character of the previous line and the first character of the current line -
The form
(?!Preferred Vendor)
is a negative look-ahead, which verifies that a string “Preferred Vendor” does NOT occur at the beginning of the line -
If this assertion is true, then the regex engine matches the regex
.+\R?
, that is to say the overall line with any optional EOL character (\R?
). I used the?
quantifier ( meaning 0 OR 1 time ), just in case a non-wanted line, at the end of your file would not be followed by a line-break -
So this regex matches complete non-header lines of your file and delete them, as the replacement reges is an empty string !
Best Regards,
guy038
-
-
Totally brilliant! I had a feeling it would be easy for you wizards. I had come up with regex that would select only the header rows, but could not get it to work in Notepad++. Your solution is super elegant. Thanks also for your explanation.
-
I have a similar issue only mine involves a series of log files I have concatenated together.
Now I need to remove the header which is always 7lines and the only thing that changes is the date/time of the log.
I tried to adapt the sample above to remove the lines one at a time but it hasn’t work yetA sample is below. Thanks for any help you can provide.
**Version 2.1.1
ComputerName = ABC
ToolID = ABC
ChamberType = XYZDynamic Alignment Data for: PM3 Date: Aug-28-2016 04:11:15:812
Date; Time; Source Module; End-Effector; Y Offset; X Offset; WaferID; LotID; WaferFlow; PortName; RecipeName;**
8-28-2016; 15071; PM3; B; -1.052; 1.023; 0043771000000000-08; 0828035206356D633FRY0HPM13_PRAET_70A_0; PM13_PRAET_70A; Port2; ;
8-28-2016; 36358; PM3; B; -0.963; 0.736; 0061083200000000-22; 0828095300540D618TFR0PM3_PRAET_30A_0; PM3_PRAET_30A; Port2; ;
8-28-2016; 36691; PM3; B; -1.033; 0.641; 0061083200000000-23; 0828095300540D618TFR0PM3_PRAET_30A_0; PM3_PRAET_30A; Port2; ;
**Version 2.1.1
ComputerName = ABC
ToolID = ABC
ChamberType = XYZDynamic Alignment Data for: PM3 Date: Aug-01-2016 00:12:10:375
Date; Time; Source Module; End-Effector; Y Offset; X Offset; WaferID; LotID; WaferFlow; PortName; RecipeName;**
8-1-2016; 726; PM3; B; -2.907; 2.357; 0041508000000000-23; 0731163717116D626FFC0CPM13_PRAET_200A_0; PM13_PRAET_200A; Port1; ;
8-1-2016; 3813; PM3; B; -2.694; 2.502; 0040276900000000-01; 0801002542981D623FAL0PM13_PRAET_200A_0; PM13_PRAET_200A; Port2; ;
8-1-2016; 5592; PM3; B; -2.833; 2.416; 0040276900000000-06; 0801002542981D623FAL0PM13_PRAET_200A_0; PM13_PRAET_200A; Port2; ; -
Hello G40,
If I fully understood what you would like to concerns the lines, beginning by the string Dynamic alignment Data. Am I right ?
However, I could not decide whether you prefer :
-
(A) To delete the lines Dynamic alignment Data… and keep all the other lines
-
(B) To keep the lines Dynamic alignment Data… and delete all the other lines
Anyway, I’m going to give the regexes for the both cases :-)) So :
-
For case (A), no difficulty : just search for the regex
(?-s)^Dynamic Alignment Data.+\R
-
For case (B), just run the regex
(?-s)^(?!Dynamic Alignment Data).*\R?
in both cases, the replacement part is, always, left
EMPTY
For the notes on the regexes , just refer to my previous post. In addition :
-
In the (B) regex, I used
.*
( instead of.+
) which allows the regex to delete all the true empty lines -
As usual, the
\R
is a shortened syntax to match any line-break (\r\n
, in Windows files,\n
, in Unix files and\r
, in old Mac files )
Best Regards,
guy038
-
-
Thanks Guy038.
That will work but not quite what I had in mind. I’ll have to to it in a few iterations but that is ok. I am trying to delete the entire header as shown below and keep the remainder. That being said I can easily use what you gave me to do what I need.
Thanks for the help.
Version 2.1.1
ComputerName = ABC
ToolID = ABC
ChamberType = XYZDynamic Alignment Data for: PM3 Date: Aug-01-2016 00:12:10:375
Date; Time; Source Module; End-Effector; Y Offset; X Offset; WaferID; LotID; WaferFlow; PortName; RecipeName; -
Hi G40,
OK, I see, now what you would like to :-))
-
Delete any part, as below :
**Version 2.1.1
ComputerName = ABC
ToolID = ABC
ChamberType = XYZDynamic Alignment Data for: PM3 Date: Aug-28-2016 04:11:15:812
Date; Time; Source Module; End-Effector; Y Offset; X Offset; WaferID; LotID; WaferFlow; PortName; RecipeName;** -
Keep any part, as below :
8-28-2016; 15071; PM3; B; -1.052; 1.023; 0043771000000000-08; 0828035206356D633FRY0HPM13_PRAET_70A_0; PM13_PRAET_70A; Port2; ;
8-28-2016; 36358; PM3; B; -0.963; 0.736; 0061083200000000-22; 0828095300540D618TFR0PM3_PRAET_30A_0; PM3_PRAET_30A; Port2; ;
8-28-2016; 36691; PM3; B; -1.033; 0.641; 0061083200000000-23; 0828095300540D618TFR0PM3_PRAET_30A_0; PM3_PRAET_30A; Port2; ;
I just noticed that :
-
All the lines that you want to keep begin with a digit
-
All the lines that you want to delete, included true blank lines, do NOT begin with a digit
This fact simplifies, drastically, our regex : we just have to delete all the lines which do not begin with a digit ! This ends to the simple S/R, below :
SEARCH
^\D.+\R|^\R
REPLACE
Empty
Notes :
- From beginning of line (
^
), we’re searching for a non-digit character (\D
) ( The opposite of\d
), followed by any non-null range of characters (.+
) , followed, itself by EOL character(s) (\R
)
OR
-
From beginning of line (
^
), we’re searching for EOL character(s) (\R
) ( So, a true blank line ! ) -
As the replacement zone is empty, all the complete lines are, then, deleted !
Remark : The search regex may, also, be written
^(\D.+)?\R
, without any alternativeCheers,
guy038
-
-
Thats awesome! Thanks This saves me SO much time…