EXTRACT COLUMNS WITH NOTEPAD

sakijhon

Hello guys, I have a file, where I need to separate the sixth column and the ninth column, do friends help me?

frankie:student
charoty:notstudent
tiagonne:teacher

thank you

file:

3.00000:0.0009900000:S:{0}:frankgotti@me.com:frankie:Tucson:456789:student:marketing:approved
3.50000:0.3010299957:C:{2}:chary@hotmail.com:charoty:North_Oracle:567234:notstudent:mathematics:disapproved
3.75000:0.6020500913:S:{2}:tytarty@yagoo.com:tiagonne:Mission_District:126789:teacher:course_algebra:teacher

Terry R

@sakijhon

Assumptions:

It appears that the : (colon) is your delimiter between fields.
No field has the colon as part of the data (which would mean the field would need to have quotes around it).
If either assumption is invalid you need to expand on the example and provide more information.

My regex needs to have the search mode as “regular expression” and wraparound ticked.
Find What:([^:]+:){5}([^:]+:)([^:]+:){2}([^:]+).+$
Replace With:\2\4\r\n

Give it a go and let me know how you got on. If it does not work it would be great to see data that was adversely affected before it was changed. I strongly suggest you read the FAQ questions, in particular information on how to represent data in these posts so that it is not affected by the interpreter running the posts.

Terry

guy038

Hello, @sakijhon, @Terry-R and All,

An other formulation could be :

SEARCH (?-s)^(.+?:){6}(.+?:){2}(.+?)\x20*:.+

REPLACE \1\3

Notes :

Because the final value of group1 is, just, the contents of the 6th field, anyway !
I added the part \x20*, located outside the group3, to remove unnecessary blank chars, at end of last kept field \3
I also used lazy quantifiers +? instead of negative class character [^:]

Best Regards,

guy038

Terry R

@guy038
Thanks for that revelation. I didn’t know what the result of group 1 would be when using it as (+?:){6}, never considered it.

So, actually using that concept you could also apply it to the 9th field. I understand you wanting to remove “possible” blanks but as none were present in the example I doubt highly they would exist.
@sakijhon
Even so, leaving that option in (to remove possible blanks) we have.

Find What:(?-s)^(.+?:){6}((.+?)(\x20*:)){3}.+
Replace With:\1\3

So the 2nd capture group ((.+?)(\x20*:)){3} can do the work but as seperate “sub-captures”, allowing us to just take the first portion and return it to the resulting line.

Cheers
Terry

guy038

Hi, @sakijhon, @Terry-R and All,

I understood why I wrongly supposed that it could be some spaces, at end of fields. Just because, analyzing the initial OP’s lines, with the TextFX option Line up multiple lines by (Clipboard Character), the plugin added some space chars to get all : separators aligned. And somehow, I probably used pieces of text containing these blank chars :-((

So, Terry, you’re right and, probably, blank characters do not exist before or after semicolons, in the OP’s text. Thus, the search regex can be simplified as below. Note, also, that the replacement remain unchanged :-))

SEARCH (?-s)^(.+?:){6}((.+?):){3}.+
REPLACE \1\3

Wow ! We just have learned a useful rule, regarding delimiter-separated format files :-))

Let’s imagine, for instance, a file, containing, exactly, 12 fields, with the ! exclamation mark as a delimiter char.

Now, let’s suppose that you would extract the 3rd, 5th, 6th and 11th fields, only, from that file

So, we start with this sample text :

3.00000!0.0009900000!S!{0}!frankgotti@me.com!frankie!Tucson!456789!student!marketing!approved!last field, line 1
3.50000!0.3010299957!C!{2}!chary@hotmail.com!charoty!!orth_Oracle!567234!notstudent!mathematics!disapproved!last field, line 2
3.75000!0.6020500913!S!{2}!tytarty@yagoo.com!tiagonne!Mission_District!126789!teacher!course_algebra!teacher!last field, line 3

We, first do simple subtractions in order to determine all the coefficients, used as quantifiers, in the regex :

 Fld   |   Qtf

  3 - 0  =  3
    \ 
  5 - 3  =  2
    \
  6 - 5  =  1
    \ 
 11 - 6  =  5

This leads to these correct regexes :

SEARCH (?-s)^(.+?!){3}(.+?!){2}(.+?!)((.+?)!){5}.+
REPLACE \1\2\3\5

OR

SEARCH (?-s)^(.+?!){3}(.+?!){2}(.+?!)(?:(.+?)!){5}.+
REPLACE \1\2\3\4

And… we get the expected results :

S!frankgotti@me.com!frankie!approved
C!chary@hotmail.com!charoty!disapproved
S!tytarty@yagoo.com!tiagonne!teacher

Et voilà !

Cheers,

guy038

P.S. :

In case of unwanted space characters, right before or after any separator, it is more sensible to do the preliminary S/R, below :

SEARCH \x20+Separator\x20+

REPLACE Separator