Help me please, how can I extract the mail and the next column?



  • Help me please, how can I extract the mail and the next column?

    678:ina_caeter@yahoo.com:Lina:S:{0}:2.0791812460:C:{5,0,2}:codepostal_23456:student:Kenty:level57:Elite:Knight
    889:Dogietreats:Terry:1.5000000000000000:0.3010299957:C:{2}:1:codepostal_13567:doctor:Tygger:level34:Elder:Druid
    990:Charz4you:367589:1.7500000000000000:0.6020599913:S:{0,2}:34:codepostal_45217:architect:Pog:level122:Elite:Knight
    .
    .
    .
    .
    .
    friend i need it
    ina_caeter@yahoo.com:Lina
    Dogietreats:Terry
    Charz4you:367589

    my file is 7286246 lines



  • @oscar-remiccc

    What have you tried already?



  • Hi,@oscar-remiccc,

    Let’s try to be logic !

    • The different fields of your text are delimited with a colon character

    • This search can be considered as a mono-line search, as the different fields are not split on several lines

    • As you want to keep the 2nd and 3rd fields, only, any search will have to refer to an anchor ( the beginning of line location ^ seems obvious ! )

    • To search for a complete range of chars, between two : delimiters, we should search for any non-null range of consecutive characters, different from, either, a colon and any EOL char. So the negative class character [^:\n\r]

    From above, one solution could be, then :

    SEARCH ^[^:\n\r]+:([^:\n\r]+:[^:\n\r]+):.+

    REPLACE \1

    Notes :

    • From beginning of line ^, this regex looks for any line contents ( the first three fields, followed with the reminder of the line :.+ )

    • The block [^:\n\r]+:[^:\n\r]+ ( 2nd + 3rd fields, surrounded with parentheses, defines the group 1

    • So, in replacement, any line contents is replaced with these 2nd and 3rd fields, separated with a : character


    Using the lazy quantifier +?, this regex S/R is a bit shorter and becomes :

    SEARCH (?-s)^.+?:(.+?:.+?):.+

    REPLACE \1

    Note that the first part (?-s)^.+?: searches, from beginning of line ^, the shortest non-null range of standard characters, which is followed with a colon char. So, this range does not contain any : character ;-))

    Best Regards,

    guy038



  • @guy038
    thank you very much guy038, a query I was trying with this code, this simple example but it did not work, it eliminates the last character

    990:Charz4you:367589:1.7500000000000000:0.6020599913

    SEARCH: ^([^ ]+?):([^ ]+?):([^ ]+?):([^ ]+?):([^ ]+?).$

    REPLACE $2:$5

    Charz4you:0.602059991

    Eliminate the number 3, what am I doing wrong? please



  • Hi,@oscar-remiccc,

    So, to get the 2nd and 5th fields only, just delete, in your regex, the last ., before the $, as below ! That should do the trick !

    SEARCH ^([^ ]+?):([^ ]+?):([^ ]+?):([^ ]+?):([^ ]+?)$

    REPLACE $2:$5

    You’ll get the expected test :

    Charz4you:0.6020599913
    

    I think that, using the syntax of my previous post, we can simplify the search regex, as below :

    SEARCH ^([^:\n\r]+):([^:\n\r]+):([^:\n\r]+):([^:\n\r]+):([^:\n\r]+)

    REPLACE $2:$5

    But, you do not need to store all the fields between parentheses ! Just store the fields 2 and 5 and if you include the : in group 2, we get the regex S/R :

    SEARCH ^[^:\n\r]+:([^:\n\r]+:)[^:\n\r]+:[^:\n\r]+:([^:\n\r]+)

    REPLACE $1$2

    Finally, you do not need to explicit the groups 3 and 4, too ! So, the part [^:\n\r]+:[^:\n\r]+ ( groups 3 and 4 ), can, simply, be changed into .+, giving the final S/R :

    SEAARCH ^[^:\n\r]+:([^:\n\r]+:).+:([^:\n\r]+)

    REPLACE $1$2

    Best Regards,

    guy038



  • @guy038
    you are a great teacher, thank you very much


Log in to reply