column separation

  • My file has more than 200,000 lines, with different column numbers, this is my example, but I always need the second and fifth column

    **miel:combis @ gmail. com:yyyy:ippps:flowers:text:123.34.78:126577
    1:chico @ yop .com:wwww:aveni234:tellme
    text:james @ mail .com:1265768566:carrss:ssisso:sabc:aaa
    4:bates @ me .com:13265772833:iiii:gloria:ip:125.45.67:–rI3:alex

    I want it like this:

    combis @ gmail .com:flowers
    chico @ yop .com:tellme
    james @ mail .com:ssisso
    bates @ me .com:gloria

    the mail are united

  • Hello, @toti-chalo and All,

    As your file is a colon delimited file, here is a possible solution :

    • Open the Replace dialoag ( Ctrl + H )

    • SEARCH (?-s)^.+?:(.+?:)(.+?:){2}(.+?)(:.*)?(?=\R|\z)

    • REPLACE \1\3

    • Tick the Wrap around option, if necessary

    • Select the Regular expression search mode

    • Click once on the Replace All button or several times on the Replace button

    Notes :

    • The first part (?-s) ensures that the regex char . will match a single standard character, only ( not EOL ones )

    • Then the next part ^.+?: searches the shortest range of standard chars, from beginning of line till a colon character ( the 1st column )

    • Now, the part (.+?:) represents again the shortest range of standard chars till the next colon character ( the 2nd column), which is stored as group 1, due to the parentheses

    • Next, comes the part (.+?:){2}, which matches the next two columns 3rd and 4th with their ending colon chars

    • Finally, the part (.+?)(:.*)? represents the shortest range of chars, stored as group 2, optionally followed with a colon char and other columns :.* , till a line-break or the very end of the file, due to the look-ahead condition (?=\R|\z), which is not part of the overall match

    In replacement, we simply rewrite the 1st and 3rd groups, standing for the 2nd and 5th columns of your table

    Best Regards,


Log in to reply