EXTRACT COLUMNS WITH NOTEPAD



  • Hello guys, I have a file, where I need to separate the sixth column and the ninth column, do friends help me?

    frankie:student
    charoty:notstudent
    tiagonne:teacher

    thank you

    file:

    3.00000:0.0009900000:S:{0}:frankgotti@me.com:frankie:Tucson:456789:student:marketing:approved
    3.50000:0.3010299957:C:{2}:chary@hotmail.com:charoty:North_Oracle:567234:notstudent:mathematics:disapproved
    3.75000:0.6020500913:S:{2}:tytarty@yagoo.com:tiagonne:Mission_District:126789:teacher:course_algebra:teacher



  • @sakijhon

    Assumptions:

    1. It appears that the : (colon) is your delimiter between fields.
    2. No field has the colon as part of the data (which would mean the field would need to have quotes around it).
      If either assumption is invalid you need to expand on the example and provide more information.

    My regex needs to have the search mode as “regular expression” and wraparound ticked.
    Find What:([^:]+:){5}([^:]+:)([^:]+:){2}([^:]+).+$
    Replace With:\2\4\r\n

    Give it a go and let me know how you got on. If it does not work it would be great to see data that was adversely affected before it was changed. I strongly suggest you read the FAQ questions, in particular information on how to represent data in these posts so that it is not affected by the interpreter running the posts.

    Terry



  • Hello, @sakijhon, @Terry-R and All,

    An other formulation could be :

    SEARCH (?-s)^(.+?:){6}(.+?:){2}(.+?)\x20*:.+

    REPLACE \1\3

    Notes :

    • Because the final value of group1 is, just, the contents of the 6th field, anyway !

    • I added the part \x20*, located outside the group3, to remove unnecessary blank chars, at end of last kept field \3

    • I also used lazy quantifiers +? instead of negative class character [^:]

    Best Regards,

    guy038



  • @guy038
    Thanks for that revelation. I didn’t know what the result of group 1 would be when using it as (+?:){6}, never considered it.

    So, actually using that concept you could also apply it to the 9th field. I understand you wanting to remove “possible” blanks but as none were present in the example I doubt highly they would exist.
    @sakijhon
    Even so, leaving that option in (to remove possible blanks) we have.

    Find What:(?-s)^(.+?:){6}((.+?)(\x20*:)){3}.+
    Replace With:\1\3

    So the 2nd capture group ((.+?)(\x20*:)){3} can do the work but as seperate “sub-captures”, allowing us to just take the first portion and return it to the resulting line.

    Cheers
    Terry



  • Hi, @sakijhon, @Terry-R and All,

    I understood why I wrongly supposed that it could be some spaces, at end of fields. Just because, analyzing the initial OP’s lines, with the TextFX option Line up multiple lines by (Clipboard Character), the plugin added some space chars to get all : separators aligned. And somehow, I probably used pieces of text containing these blank chars :-((

    So, Terry, you’re right and, probably, blank characters do not exist before or after semicolons, in the OP’s text. Thus, the search regex can be simplified as below. Note, also, that the replacement remain unchanged :-))

    • SEARCH (?-s)^(.+?:){6}((.+?):){3}.+

    • REPLACE \1\3


    Wow ! We just have learned a useful rule, regarding delimiter-separated format files :-))

    Let’s imagine, for instance, a file, containing, exactly, 12 fields, with the ! exclamation mark as a delimiter char.

    Now, let’s suppose that you would extract the 3rd, 5th, 6th and 11th fields, only, from that file

    So, we start with this sample text :

    3.00000!0.0009900000!S!{0}!frankgotti@me.com!frankie!Tucson!456789!student!marketing!approved!last field, line 1
    3.50000!0.3010299957!C!{2}!chary@hotmail.com!charoty!!orth_Oracle!567234!notstudent!mathematics!disapproved!last field, line 2
    3.75000!0.6020500913!S!{2}!tytarty@yagoo.com!tiagonne!Mission_District!126789!teacher!course_algebra!teacher!last field, line 3
    

    We, first do simple subtractions in order to determine all the coefficients, used as quantifiers, in the regex :

    Fld     Qtf
    -----------
     3 - 0 = 3
       \ 
     5 - 3 = 2
       \
     6 - 5 = 1
       \ 
    11 - 6 = 5
    

    This leads to the correct regex :

    SEARCH (?-s)^(.+?!){3}(.+?!){2}(.+?!)(?:(.+?)!){5}.+

    REPLACE \1\2\3\4

    And… we get the expected results :

    S!frankgotti@me.com!frankie!approved
    C!chary@hotmail.com!charoty!disapproved
    S!tytarty@yagoo.com!tiagonne!teacher
    

    Et voilà !

    Cheers,

    guy038

    P.S. :

    In case of unwanted space characters, right before or after any separator, it is more sensible to do the preliminary S/R, below :

    SEARCH \x20+Separator\x20+

    REPLACE Separator


Log in to reply