Help me please, how can I extract the mail and the next column?

oscar remiccc

678:ina_caeter@yahoo.com:Lina:S:{0}:2.0791812460:C:{5,0,2}:codepostal_23456:student:Kenty:level57:Elite:Knight
889:Dogietreats:Terry:1.5000000000000000:0.3010299957:C:{2}:1:codepostal_13567:doctor:Tygger:level34:Elder:Druid
990:Charz4you:367589:1.7500000000000000:0.6020599913:S:{0,2}:34:codepostal_45217:architect:Pog:level122:Elite:Knight
.
.
.
.
.
friend i need it
ina_caeter@yahoo.com:Lina
Dogietreats:Terry
Charz4you:367589

my file is 7286246 lines

Alan Kilborn

@oscar-remiccc

What have you tried already?

guy038

Hi, @oscar-remiccc,

Let’s try to be logic !

The different fields of your text are delimited with a colon character
This search can be considered as a mono-line search, as the different fields are not split on several lines
As you want to keep the 2nd and 3rd fields, only, any search will have to refer to an anchor ( the beginning of line location ^ seems obvious ! )
To search for a complete range of chars, between two : delimiters, we should search for any non-null range of consecutive characters, different from, either, a colon and any EOL char. So the negative class character [^:\n\r]

From above, one solution could be, then :

SEARCH ^[^:\n\r]+:([^:\n\r]+:[^:\n\r]+):.+

REPLACE \1

Notes :

From beginning of line ^, this regex looks for any line contents ( the first three fields, followed with the reminder of the line :.+ )
The block [^:\n\r]+:[^:\n\r]+ ( 2nd + 3rd fields, surrounded with parentheses, defines the group 1
So, in replacement, any line contents is replaced with these 2nd and 3rd fields, separated with a : character

Using the lazy quantifier +?, this regex S/R is a bit shorter and becomes :

SEARCH (?-s)^.+?:(.+?:.+?):.+

REPLACE \1

Note that the first part (?-s)^.+?: searches, from beginning of line ^, the shortest non-null range of standard characters, which is followed with a colon char. So, this range does not contain any : character ;-))

Best Regards,

guy038

oscar remiccc

@guy038
thank you very much guy038, a query I was trying with this code, this simple example but it did not work, it eliminates the last character

990:Charz4you:367589:1.7500000000000000:0.6020599913

SEARCH: ^([^ ]+?):([^ ]+?):([^ ]+?):([^ ]+?):([^ ]+?).$

REPLACE $2:$5

Charz4you:0.602059991

Eliminate the number 3, what am I doing wrong? please

guy038

Hi, @oscar-remiccc,

So, to get the 2nd and 5th fields only, just delete, in your regex, the last ., before the $, as below ! That should do the trick !

SEARCH ^([^ ]+?):([^ ]+?):([^ ]+?):([^ ]+?):([^ ]+?)$

REPLACE $2:$5

You’ll get the expected test :

Charz4you:0.6020599913

I think that, using the syntax of my previous post, we can simplify the search regex, as below :

SEARCH ^([^:\n\r]+):([^:\n\r]+):([^:\n\r]+):([^:\n\r]+):([^:\n\r]+)

REPLACE $2:$5

But, you do not need to store all the fields between parentheses ! Just store the fields 2 and 5 and if you include the : in group 2, we get the regex S/R :

SEARCH ^[^:\n\r]+:([^:\n\r]+:)[^:\n\r]+:[^:\n\r]+:([^:\n\r]+)

REPLACE $1$2

Finally, you do not need to explicit the groups 3 and 4, too ! So, the part [^:\n\r]+:[^:\n\r]+ ( groups 3 and 4 ), can, simply, be changed into .+, giving the final S/R :

SEAARCH ^[^:\n\r]+:([^:\n\r]+:).+:([^:\n\r]+)

REPLACE $1$2

Best Regards,

guy038

oscar remiccc

@guy038
you are a great teacher, thank you very much