Community

    • Login
    • Search
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search

    Help me please, how can I extract the mail and the next column?

    Help wanted · · · – – – · · ·
    3
    6
    420
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • oscar remiccc
      oscar remiccc last edited by

      Help me please, how can I extract the mail and the next column?

      678:ina_caeter@yahoo.com:Lina:S:{0}:2.0791812460:C:{5,0,2}:codepostal_23456:student:Kenty:level57:Elite:Knight
      889:Dogietreats:Terry:1.5000000000000000:0.3010299957:C:{2}:1:codepostal_13567:doctor:Tygger:level34:Elder:Druid
      990:Charz4you:367589:1.7500000000000000:0.6020599913:S:{0,2}:34:codepostal_45217:architect:Pog:level122:Elite:Knight
      .
      .
      .
      .
      .
      friend i need it
      ina_caeter@yahoo.com:Lina
      Dogietreats:Terry
      Charz4you:367589

      my file is 7286246 lines

      Alan Kilborn 1 Reply Last reply Reply Quote 0
      • Alan Kilborn
        Alan Kilborn @oscar remiccc last edited by

        @oscar-remiccc

        What have you tried already?

        1 Reply Last reply Reply Quote 0
        • guy038
          guy038 last edited by guy038

          Hi, @oscar-remiccc,

          Let’s try to be logic !

          • The different fields of your text are delimited with a colon character

          • This search can be considered as a mono-line search, as the different fields are not split on several lines

          • As you want to keep the 2nd and 3rd fields, only, any search will have to refer to an anchor ( the beginning of line location ^ seems obvious ! )

          • To search for a complete range of chars, between two : delimiters, we should search for any non-null range of consecutive characters, different from, either, a colon and any EOL char. So the negative class character [^:\n\r]

          From above, one solution could be, then :

          SEARCH ^[^:\n\r]+:([^:\n\r]+:[^:\n\r]+):.+

          REPLACE \1

          Notes :

          • From beginning of line ^, this regex looks for any line contents ( the first three fields, followed with the reminder of the line :.+ )

          • The block [^:\n\r]+:[^:\n\r]+ ( 2nd + 3rd fields, surrounded with parentheses, defines the group 1

          • So, in replacement, any line contents is replaced with these 2nd and 3rd fields, separated with a : character


          Using the lazy quantifier +?, this regex S/R is a bit shorter and becomes :

          SEARCH (?-s)^.+?:(.+?:.+?):.+

          REPLACE \1

          Note that the first part (?-s)^.+?: searches, from beginning of line ^, the shortest non-null range of standard characters, which is followed with a colon char. So, this range does not contain any : character ;-))

          Best Regards,

          guy038

          oscar remiccc 1 Reply Last reply Reply Quote 3
          • oscar remiccc
            oscar remiccc @guy038 last edited by

            @guy038
            thank you very much guy038, a query I was trying with this code, this simple example but it did not work, it eliminates the last character

            990:Charz4you:367589:1.7500000000000000:0.6020599913

            SEARCH: ^([^ ]+?):([^ ]+?):([^ ]+?):([^ ]+?):([^ ]+?).$

            REPLACE $2:$5

            Charz4you:0.602059991

            Eliminate the number 3, what am I doing wrong? please

            1 Reply Last reply Reply Quote 0
            • guy038
              guy038 last edited by guy038

              Hi, @oscar-remiccc,

              So, to get the 2nd and 5th fields only, just delete, in your regex, the last ., before the $, as below ! That should do the trick !

              SEARCH ^([^ ]+?):([^ ]+?):([^ ]+?):([^ ]+?):([^ ]+?)$

              REPLACE $2:$5

              You’ll get the expected test :

              Charz4you:0.6020599913
              

              I think that, using the syntax of my previous post, we can simplify the search regex, as below :

              SEARCH ^([^:\n\r]+):([^:\n\r]+):([^:\n\r]+):([^:\n\r]+):([^:\n\r]+)

              REPLACE $2:$5

              But, you do not need to store all the fields between parentheses ! Just store the fields 2 and 5 and if you include the : in group 2, we get the regex S/R :

              SEARCH ^[^:\n\r]+:([^:\n\r]+:)[^:\n\r]+:[^:\n\r]+:([^:\n\r]+)

              REPLACE $1$2

              Finally, you do not need to explicit the groups 3 and 4, too ! So, the part [^:\n\r]+:[^:\n\r]+ ( groups 3 and 4 ), can, simply, be changed into .+, giving the final S/R :

              SEAARCH ^[^:\n\r]+:([^:\n\r]+:).+:([^:\n\r]+)

              REPLACE $1$2

              Best Regards,

              guy038

              oscar remiccc 1 Reply Last reply Reply Quote 2
              • oscar remiccc
                oscar remiccc @guy038 last edited by

                @guy038
                you are a great teacher, thank you very much

                1 Reply Last reply Reply Quote 1
                • First post
                  Last post
                Copyright © 2014 NodeBB Forums | Contributors