Community
    • Login

    help please, extract columns

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    11 Posts 4 Posters 2.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • neagalN
      neagal
      last edited by neagal

      Hi all, I have a big text file and the data in it are in 7 columns, but how can I do to have two in two columns.
      ( 1 2 3 4 5 6 7 columns)
      “name”:"1 80.97.120.40 180 ms [n/a] 21 “type”:“A” @post
      “asn”:6876 90.97.117.177 134 ms [n/a] 20 “type”:“B” @post
      “asn”:23650 70.97.120.10 75 ms [n/a] 21 “type”:“A” @contact
      “ttl”:60 70.97.122.105 87 ms geodnsd.global.sonicwall.com 20 “type”:“C” @contact
      “asn”:15399 70.97.122.106 89 ms geodnsd.global.sonicwall.com 19 “type”:“A” @contact
      “ttl”:7896 70.97.122.107 95 ms www.psmythe.net 21 “type”:“B” @post
      “asn”:456 70.97.122.104 98 ms wasabi.prostructure.com 22 “type”:“C” @contact
      “name”:"376 70.97.122.108 89 ms server_ip 21 “type”:“B” @contact
      “asn”:15343 70.97.122.109 87 ms xcder.com 21 “type”:“A” @post

      I need results like this example
      1 2
      1 3
      1 4
      1 5
      1 6
      1 7

      2 1
      2 3
      2 4
      2 5
      2 6
      2 7

      and so on with the others

      the columns are separated, How could I do with regular expression mode?

      thanks

      Scott SumnerS 1 Reply Last reply Reply Quote 0
      • Scott SumnerS
        Scott Sumner @neagal
        last edited by

        @neagal

        If the columns are space-separated, I see EIGHT columns, not SEVEN.

        neagalN 1 Reply Last reply Reply Quote 0
        • neagalN
          neagal @Scott Sumner
          last edited by

          @Scott-Sumner

          “name”:"1 80.97.120.40 180ms [n/a] 21 “type”:“A” @post

          hello friend, are 7 columns

          1 Reply Last reply Reply Quote 0
          • Terry RT
            Terry R
            last edited by Terry R

            @neagal
            @Scott-Sumner said it’s a bit confusing, even more so now as it would appear you have provided different data to the original. So is the 180ms field definitely “180ms” or is it as you originally typed “180 ms”, where there is a space in the middle.

            I’m also a bit unsure of what you intend to happen as the 1 2 to 1 7 differs from the 2 1 to 2 7.
            If we consider each line separately I think what you want is a line with the 1st and 2nd field separated by a space, followed by a line with 1st and 3rd columns, through to the 1st and 7th column. This would be repeated for every original line of data, is this correct?

            So using your original data I produced this result:

            “name”:"1  80.97.120.40 
            “name”:"1  180 ms 
            “name”:"1  [n/a] 
            “name”:"1  21 
            “name”:"1  “type”:“A” 
            “name”:"1  @post
            “asn”:6876  90.97.117.177 
            “asn”:6876  134 ms 
            “asn”:6876  [n/a] 
            “asn”:6876  20 
            “asn”:6876  “type”:“B” 
            “asn”:6876  @post
            “asn”:23650  70.97.120.10 
            “asn”:23650  75 ms 
            “asn”:23650  [n/a] 
            “asn”:23650  21 
            “asn”:23650  “type”:“A” 
            “asn”:23650  @contact
            “ttl”:60  70.97.122.105 
            “ttl”:60  87 ms 
            “ttl”:60  geodnsd.global.sonicwall.com 
            “ttl”:60  20 
            “ttl”:60  “type”:“C” 
            “ttl”:60  @contact
            “asn”:15399  70.97.122.106 
            “asn”:15399  89 ms 
            “asn”:15399  geodnsd.global.sonicwall.com 
            “asn”:15399  19 
            “asn”:15399  “type”:“A” 
            “asn”:15399  @contact
            “ttl”:7896  70.97.122.107 
            “ttl”:7896  95 ms 
            “ttl”:7896  www.psmythe.net 
            “ttl”:7896  21 
            “ttl”:7896  “type”:“B” 
            “ttl”:7896  @post
            “asn”:456  70.97.122.104 
            “asn”:456  98 ms 
            “asn”:456  wasabi.prostructure.com 
            “asn”:456  22 
            “asn”:456  “type”:“C” 
            “asn”:456  @contact
            “name”:"376  70.97.122.108 
            “name”:"376  89 ms 
            “name”:"376  server_ip 
            “name”:"376  21 
            “name”:"376  “type”:“B” 
            “name”:"376  @contact
            “asn”:15343  70.97.122.109 
            “asn”:15343  87 ms 
            “asn”:15343  xcder.com 
            “asn”:15343  21 
            “asn”:15343  “type”:“A” 
            “asn”:15343  @post
            

            Terry

            neagalN 1 Reply Last reply Reply Quote 2
            • neagalN
              neagal @Terry R
              last edited by

              @Terry-R said:

              If we consider each line separately I think what you want is a line with the 1st and 2nd field separated by a space, followed by a line with 1st and 3rd columns, through to the 1st and 7th column. This would be repeated for every original line of data, is this correct?

              it is right

              1 Reply Last reply Reply Quote 0
              • Terry RT
                Terry R
                last edited by

                @neagal
                There’s still 1 unanswered question, is the 180ms field as you originally typed it or as you typed it in response to @Scott-Sumner question?
                So is it:

                180 ms
                or
                180ms
                

                Terry

                neagalN 1 Reply Last reply Reply Quote 0
                • neagalN
                  neagal @Terry R
                  last edited by

                  @Terry-R said:

                  180ms

                  180ms

                  1 Reply Last reply Reply Quote 0
                  • Terry RT
                    Terry R
                    last edited by Terry R

                    @neagal
                    Before we start on this we need to add a final blank line to the original data, as my regex (regular expression) expects one at the end of every line.

                    So first we use the Replace function with
                    Find What:\z
                    Replace With:\r\n

                    For all the regex’s provided we need the search mode set to “regular expression” and “wrap around” ticked. You can hit the replace button, or replace all button, it will only find 1 occurrence and add a blank line at the end of the data. You may want to confirm that by checking the bottom of Notepad++ where it shows a count of lines, see if it increases by 1 after running the first regex.

                    So the Replace function to alter your data is.
                    Find What:^([^ ]+?) ([^ ]+?) ([^ ]+ms) ([^ ]+?) ([^ ]+?) ([^ ]+?) ([^ ]+?)(\R)

                    For the Replace field I also have a question, do you want a blank line between each set of fields. So all 6 lines of the original line 1 followed by a blank line, then 6 lines of original line 2, repeated?
                    If blank line between sets then
                    Replace With:\1 \2\8\1 \3\8\1 \4\8\1 \5\8\1 \6\8\1 \7\8\8
                    If no blank line between then
                    Replace With:\1 \2\8\1 \3\8\1 \4\8\1 \5\8\1 \6\8\1 \7\8
                    As before, set search mode to “regular expression” and “wrap around” ticked. Click once on “replace all”, then check the resulting data. You can see the important stuff is outlined in red text with a shadow background. This text can be copied directly using your keyboard or mouse, so no need to type each character individually.

                    Let us know how you got on. If any further issues then you may need to elaborate on the data used, or at least provide some of the examples where the regex did NOT work, so we can amend it to suit.

                    Finally remove that last blank line.

                    Terry

                    1 Reply Last reply Reply Quote 0
                    • Terry RT
                      Terry R
                      last edited by

                      @neagal

                      The following regex, whilst not wrong has “ms” included which isn’t necessary. I had that in originally as I believed your data was of the type “180 ms”. When you confirmed the space in the middle was not actually there I did not remove it efficiently.

                      ^([^ ]+?) ([^ ]+?) ([^ ]+ms) ([^ ]+?) ([^ ]+?) ([^ ]+?) ([^ ]+?)(\R)

                      You will notice all other instances are ([^ ]+?), the one with “ms” could actually be the same. I just did not notice it when posting. I thought it would be best to advertise that fact now so you don’t get the wrong impression of how the regex works and be confused over why I had the “ms” in that instance, yet no other characters in other instances.

                      Terry

                      1 Reply Last reply Reply Quote 1
                      • Scott SumnerS
                        Scott Sumner
                        last edited by

                        Too many questions…too much obscurity…not only do you have to solve the OP’s problem…you have to straighten the OP out to get a coherent problem presented…good that you have a lot of endurance, @Terry-R !

                        1 Reply Last reply Reply Quote 0
                        • guy038G
                          guy038
                          last edited by

                          Hi, @neagal, @Terry-R, @scott-sumner and All,

                          • About a possible space char, before the ms string, you can delete it, to get a coherent field, with the regex S/R :

                            • SEARCH (?-i)(?<=\d)\x20(?=ms)

                            • REPLACE Leave EMPTY

                          • Now, if we use lazy quantifiers, the Terry’s regex S/R, with a line-break between each block, turns to :

                            • SEARCH (?-s)^(.+?\x20)(.+?)\x20(.+?)\x20(.+?)\x20(.+?)\x20(.+?)\x20(.+)(\R)

                            • REPLACE \1\2\8\1\3\8\1\4\8\1\5\8\1\6\8\1\7\8\8

                          • Eventually, if necessary, you can move back to the #### ms syntax, with a space between, with the regex S/R :

                            • SEARCH ms$

                            • REPLACE \x20$0

                          Best Regards,

                          guy038

                          1 Reply Last reply Reply Quote 2
                          • First post
                            Last post
                          The Community of users of the Notepad++ text editor.
                          Powered by NodeBB | Contributors