Community
    • Login

    EXTRACT COLUMNS WITH NOTEPAD

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    5 Posts 3 Posters 3.0k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • sakijhonS
      sakijhon
      last edited by

      Hello guys, I have a file, where I need to separate the sixth column and the ninth column, do friends help me?

      frankie:student
      charoty:notstudent
      tiagonne:teacher

      thank you

      file:

      3.00000:0.0009900000:S:{0}:frankgotti@me.com:frankie:Tucson:456789:student:marketing:approved
      3.50000:0.3010299957:C:{2}:chary@hotmail.com:charoty:North_Oracle:567234:notstudent:mathematics:disapproved
      3.75000:0.6020500913:S:{2}:tytarty@yagoo.com:tiagonne:Mission_District:126789:teacher:course_algebra:teacher

      1 Reply Last reply Reply Quote 0
      • Terry RT
        Terry R
        last edited by Terry R

        @sakijhon

        Assumptions:

        1. It appears that the : (colon) is your delimiter between fields.
        2. No field has the colon as part of the data (which would mean the field would need to have quotes around it).
          If either assumption is invalid you need to expand on the example and provide more information.

        My regex needs to have the search mode as “regular expression” and wraparound ticked.
        Find What:([^:]+:){5}([^:]+:)([^:]+:){2}([^:]+).+$
        Replace With:\2\4\r\n

        Give it a go and let me know how you got on. If it does not work it would be great to see data that was adversely affected before it was changed. I strongly suggest you read the FAQ questions, in particular information on how to represent data in these posts so that it is not affected by the interpreter running the posts.

        Terry

        1 Reply Last reply Reply Quote 1
        • guy038G
          guy038
          last edited by guy038

          Hello, @sakijhon, @Terry-R and All,

          An other formulation could be :

          SEARCH (?-s)^(.+?:){6}(.+?:){2}(.+?)\x20*:.+

          REPLACE \1\3

          Notes :

          • Because the final value of group1 is, just, the contents of the 6th field, anyway !

          • I added the part \x20*, located outside the group3, to remove unnecessary blank chars, at end of last kept field \3

          • I also used lazy quantifiers +? instead of negative class character [^:]

          Best Regards,

          guy038

          1 Reply Last reply Reply Quote 2
          • Terry RT
            Terry R
            last edited by

            @guy038
            Thanks for that revelation. I didn’t know what the result of group 1 would be when using it as (+?:){6}, never considered it.

            So, actually using that concept you could also apply it to the 9th field. I understand you wanting to remove “possible” blanks but as none were present in the example I doubt highly they would exist.
            @sakijhon
            Even so, leaving that option in (to remove possible blanks) we have.

            Find What:(?-s)^(.+?:){6}((.+?)(\x20*:)){3}.+
            Replace With:\1\3

            So the 2nd capture group ((.+?)(\x20*:)){3} can do the work but as seperate “sub-captures”, allowing us to just take the first portion and return it to the resulting line.

            Cheers
            Terry

            1 Reply Last reply Reply Quote 1
            • guy038G
              guy038
              last edited by guy038

              Hi, @sakijhon, @Terry-R and All,

              I understood why I wrongly supposed that it could be some spaces, at end of fields. Just because, analyzing the initial OP’s lines, with the TextFX option Line up multiple lines by (Clipboard Character), the plugin added some space chars to get all : separators aligned. And somehow, I probably used pieces of text containing these blank chars :-((

              So, Terry, you’re right and, probably, blank characters do not exist before or after semicolons, in the OP’s text. Thus, the search regex can be simplified as below. Note, also, that the replacement remain unchanged :-))

              • SEARCH (?-s)^(.+?:){6}((.+?):){3}.+

              • REPLACE \1\3


              Wow ! We just have learned a useful rule, regarding delimiter-separated format files :-))

              Let’s imagine, for instance, a file, containing, exactly, 12 fields, with the ! exclamation mark as a delimiter char.

              Now, let’s suppose that you would extract the 3rd, 5th, 6th and 11th fields, only, from that file

              So, we start with this sample text :

              3.00000!0.0009900000!S!{0}!frankgotti@me.com!frankie!Tucson!456789!student!marketing!approved!last field, line 1
              3.50000!0.3010299957!C!{2}!chary@hotmail.com!charoty!!orth_Oracle!567234!notstudent!mathematics!disapproved!last field, line 2
              3.75000!0.6020500913!S!{2}!tytarty@yagoo.com!tiagonne!Mission_District!126789!teacher!course_algebra!teacher!last field, line 3
              

              We, first do simple subtractions in order to determine all the coefficients, used as quantifiers, in the regex :

               Fld   |   Qtf
              
                3 - 0  =  3
                  \ 
                5 - 3  =  2
                  \
                6 - 5  =  1
                  \ 
               11 - 6  =  5
              

              This leads to these correct regexes :

              • SEARCH (?-s)^(.+?!){3}(.+?!){2}(.+?!)((.+?)!){5}.+

              • REPLACE \1\2\3\5

              OR

              • SEARCH (?-s)^(.+?!){3}(.+?!){2}(.+?!)(?:(.+?)!){5}.+

              • REPLACE \1\2\3\4

              And… we get the expected results :

              S!frankgotti@me.com!frankie!approved
              C!chary@hotmail.com!charoty!disapproved
              S!tytarty@yagoo.com!tiagonne!teacher
              

              Et voilà !

              Cheers,

              guy038

              P.S. :

              In case of unwanted space characters, right before or after any separator, it is more sensible to do the preliminary S/R, below :

              SEARCH \x20+Separator\x20+

              REPLACE Separator

              1 Reply Last reply Reply Quote 2
              • First post
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors