• Login
Community
  • Login

EXTRACT COLUMNS WITH NOTEPAD

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
5 Posts 3 Posters 3.0k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • S
    sakijhon
    last edited by Dec 3, 2018, 3:23 AM

    Hello guys, I have a file, where I need to separate the sixth column and the ninth column, do friends help me?

    frankie:student
    charoty:notstudent
    tiagonne:teacher

    thank you

    file:

    3.00000:0.0009900000:S:{0}:frankgotti@me.com:frankie:Tucson:456789:student:marketing:approved
    3.50000:0.3010299957:C:{2}:chary@hotmail.com:charoty:North_Oracle:567234:notstudent:mathematics:disapproved
    3.75000:0.6020500913:S:{2}:tytarty@yagoo.com:tiagonne:Mission_District:126789:teacher:course_algebra:teacher

    1 Reply Last reply Reply Quote 0
    • T
      Terry R
      last edited by Terry R Dec 3, 2018, 6:08 AM Dec 3, 2018, 6:05 AM

      @sakijhon

      Assumptions:

      1. It appears that the : (colon) is your delimiter between fields.
      2. No field has the colon as part of the data (which would mean the field would need to have quotes around it).
        If either assumption is invalid you need to expand on the example and provide more information.

      My regex needs to have the search mode as “regular expression” and wraparound ticked.
      Find What:([^:]+:){5}([^:]+:)([^:]+:){2}([^:]+).+$
      Replace With:\2\4\r\n

      Give it a go and let me know how you got on. If it does not work it would be great to see data that was adversely affected before it was changed. I strongly suggest you read the FAQ questions, in particular information on how to represent data in these posts so that it is not affected by the interpreter running the posts.

      Terry

      1 Reply Last reply Reply Quote 1
      • G
        guy038
        last edited by guy038 Dec 3, 2018, 4:54 PM Dec 3, 2018, 1:49 PM

        Hello, @sakijhon, @Terry-R and All,

        An other formulation could be :

        SEARCH (?-s)^(.+?:){6}(.+?:){2}(.+?)\x20*:.+

        REPLACE \1\3

        Notes :

        • Because the final value of group1 is, just, the contents of the 6th field, anyway !

        • I added the part \x20*, located outside the group3, to remove unnecessary blank chars, at end of last kept field \3

        • I also used lazy quantifiers +? instead of negative class character [^:]

        Best Regards,

        guy038

        1 Reply Last reply Reply Quote 2
        • T
          Terry R
          last edited by Dec 3, 2018, 6:47 PM

          @guy038
          Thanks for that revelation. I didn’t know what the result of group 1 would be when using it as (+?:){6}, never considered it.

          So, actually using that concept you could also apply it to the 9th field. I understand you wanting to remove “possible” blanks but as none were present in the example I doubt highly they would exist.
          @sakijhon
          Even so, leaving that option in (to remove possible blanks) we have.

          Find What:(?-s)^(.+?:){6}((.+?)(\x20*:)){3}.+
          Replace With:\1\3

          So the 2nd capture group ((.+?)(\x20*:)){3} can do the work but as seperate “sub-captures”, allowing us to just take the first portion and return it to the resulting line.

          Cheers
          Terry

          1 Reply Last reply Reply Quote 1
          • G
            guy038
            last edited by guy038 Nov 20, 2022, 1:31 AM Dec 3, 2018, 9:36 PM

            Hi, @sakijhon, @Terry-R and All,

            I understood why I wrongly supposed that it could be some spaces, at end of fields. Just because, analyzing the initial OP’s lines, with the TextFX option Line up multiple lines by (Clipboard Character), the plugin added some space chars to get all : separators aligned. And somehow, I probably used pieces of text containing these blank chars :-((

            So, Terry, you’re right and, probably, blank characters do not exist before or after semicolons, in the OP’s text. Thus, the search regex can be simplified as below. Note, also, that the replacement remain unchanged :-))

            • SEARCH (?-s)^(.+?:){6}((.+?):){3}.+

            • REPLACE \1\3


            Wow ! We just have learned a useful rule, regarding delimiter-separated format files :-))

            Let’s imagine, for instance, a file, containing, exactly, 12 fields, with the ! exclamation mark as a delimiter char.

            Now, let’s suppose that you would extract the 3rd, 5th, 6th and 11th fields, only, from that file

            So, we start with this sample text :

            3.00000!0.0009900000!S!{0}!frankgotti@me.com!frankie!Tucson!456789!student!marketing!approved!last field, line 1
            3.50000!0.3010299957!C!{2}!chary@hotmail.com!charoty!!orth_Oracle!567234!notstudent!mathematics!disapproved!last field, line 2
            3.75000!0.6020500913!S!{2}!tytarty@yagoo.com!tiagonne!Mission_District!126789!teacher!course_algebra!teacher!last field, line 3
            

            We, first do simple subtractions in order to determine all the coefficients, used as quantifiers, in the regex :

             Fld   |   Qtf
            
              3 - 0  =  3
                \ 
              5 - 3  =  2
                \
              6 - 5  =  1
                \ 
             11 - 6  =  5
            

            This leads to these correct regexes :

            • SEARCH (?-s)^(.+?!){3}(.+?!){2}(.+?!)((.+?)!){5}.+

            • REPLACE \1\2\3\5

            OR

            • SEARCH (?-s)^(.+?!){3}(.+?!){2}(.+?!)(?:(.+?)!){5}.+

            • REPLACE \1\2\3\4

            And… we get the expected results :

            S!frankgotti@me.com!frankie!approved
            C!chary@hotmail.com!charoty!disapproved
            S!tytarty@yagoo.com!tiagonne!teacher
            

            Et voilà !

            Cheers,

            guy038

            P.S. :

            In case of unwanted space characters, right before or after any separator, it is more sensible to do the preliminary S/R, below :

            SEARCH \x20+Separator\x20+

            REPLACE Separator

            1 Reply Last reply Reply Quote 2
            2 out of 5
            • First post
              2/5
              Last post
            The Community of users of the Notepad++ text editor.
            Powered by NodeBB | Contributors