help please, extract columns



  • Hi all, I have a big text file and the data in it are in 7 columns, but how can I do to have two in two columns.
    ( 1 2 3 4 5 6 7 columns)
    “name”:"1 80.97.120.40 180 ms [n/a] 21 “type”:“A” @post
    “asn”:6876 90.97.117.177 134 ms [n/a] 20 “type”:“B” @post
    “asn”:23650 70.97.120.10 75 ms [n/a] 21 “type”:“A” @contact
    “ttl”:60 70.97.122.105 87 ms geodnsd.global.sonicwall.com 20 “type”:“C” @contact
    “asn”:15399 70.97.122.106 89 ms geodnsd.global.sonicwall.com 19 “type”:“A” @contact
    “ttl”:7896 70.97.122.107 95 ms www.psmythe.net 21 “type”:“B” @post
    “asn”:456 70.97.122.104 98 ms wasabi.prostructure.com 22 “type”:“C” @contact
    “name”:"376 70.97.122.108 89 ms server_ip 21 “type”:“B” @contact
    “asn”:15343 70.97.122.109 87 ms xcder.com 21 “type”:“A” @post

    I need results like this example
    1 2
    1 3
    1 4
    1 5
    1 6
    1 7

    2 1
    2 3
    2 4
    2 5
    2 6
    2 7

    and so on with the others

    the columns are separated, How could I do with regular expression mode?

    thanks



  • @neagal

    If the columns are space-separated, I see EIGHT columns, not SEVEN.



  • @Scott-Sumner

    “name”:"1 80.97.120.40 180ms [n/a] 21 “type”:“A” @post

    hello friend, are 7 columns



  • @neagal
    @Scott-Sumner said it’s a bit confusing, even more so now as it would appear you have provided different data to the original. So is the 180ms field definitely “180ms” or is it as you originally typed “180 ms”, where there is a space in the middle.

    I’m also a bit unsure of what you intend to happen as the 1 2 to 1 7 differs from the 2 1 to 2 7.
    If we consider each line separately I think what you want is a line with the 1st and 2nd field separated by a space, followed by a line with 1st and 3rd columns, through to the 1st and 7th column. This would be repeated for every original line of data, is this correct?

    So using your original data I produced this result:

    “name”:"1  80.97.120.40 
    “name”:"1  180 ms 
    “name”:"1  [n/a] 
    “name”:"1  21 
    “name”:"1  “type”:“A” 
    “name”:"1  @post
    “asn”:6876  90.97.117.177 
    “asn”:6876  134 ms 
    “asn”:6876  [n/a] 
    “asn”:6876  20 
    “asn”:6876  “type”:“B” 
    “asn”:6876  @post
    “asn”:23650  70.97.120.10 
    “asn”:23650  75 ms 
    “asn”:23650  [n/a] 
    “asn”:23650  21 
    “asn”:23650  “type”:“A” 
    “asn”:23650  @contact
    “ttl”:60  70.97.122.105 
    “ttl”:60  87 ms 
    “ttl”:60  geodnsd.global.sonicwall.com 
    “ttl”:60  20 
    “ttl”:60  “type”:“C” 
    “ttl”:60  @contact
    “asn”:15399  70.97.122.106 
    “asn”:15399  89 ms 
    “asn”:15399  geodnsd.global.sonicwall.com 
    “asn”:15399  19 
    “asn”:15399  “type”:“A” 
    “asn”:15399  @contact
    “ttl”:7896  70.97.122.107 
    “ttl”:7896  95 ms 
    “ttl”:7896  www.psmythe.net 
    “ttl”:7896  21 
    “ttl”:7896  “type”:“B” 
    “ttl”:7896  @post
    “asn”:456  70.97.122.104 
    “asn”:456  98 ms 
    “asn”:456  wasabi.prostructure.com 
    “asn”:456  22 
    “asn”:456  “type”:“C” 
    “asn”:456  @contact
    “name”:"376  70.97.122.108 
    “name”:"376  89 ms 
    “name”:"376  server_ip 
    “name”:"376  21 
    “name”:"376  “type”:“B” 
    “name”:"376  @contact
    “asn”:15343  70.97.122.109 
    “asn”:15343  87 ms 
    “asn”:15343  xcder.com 
    “asn”:15343  21 
    “asn”:15343  “type”:“A” 
    “asn”:15343  @post
    

    Terry



  • @Terry-R said:

    If we consider each line separately I think what you want is a line with the 1st and 2nd field separated by a space, followed by a line with 1st and 3rd columns, through to the 1st and 7th column. This would be repeated for every original line of data, is this correct?

    it is right



  • @neagal
    There’s still 1 unanswered question, is the 180ms field as you originally typed it or as you typed it in response to @Scott-Sumner question?
    So is it:

    180 ms
    or
    180ms
    

    Terry



  • @Terry-R said:

    180ms

    180ms



  • @neagal
    Before we start on this we need to add a final blank line to the original data, as my regex (regular expression) expects one at the end of every line.

    So first we use the Replace function with
    Find What:\z
    Replace With:\r\n

    For all the regex’s provided we need the search mode set to “regular expression” and “wrap around” ticked. You can hit the replace button, or replace all button, it will only find 1 occurrence and add a blank line at the end of the data. You may want to confirm that by checking the bottom of Notepad++ where it shows a count of lines, see if it increases by 1 after running the first regex.

    So the Replace function to alter your data is.
    Find What:^([^ ]+?) ([^ ]+?) ([^ ]+ms) ([^ ]+?) ([^ ]+?) ([^ ]+?) ([^ ]+?)(\R)

    For the Replace field I also have a question, do you want a blank line between each set of fields. So all 6 lines of the original line 1 followed by a blank line, then 6 lines of original line 2, repeated?
    If blank line between sets then
    Replace With:\1 \2\8\1 \3\8\1 \4\8\1 \5\8\1 \6\8\1 \7\8\8
    If no blank line between then
    Replace With:\1 \2\8\1 \3\8\1 \4\8\1 \5\8\1 \6\8\1 \7\8
    As before, set search mode to “regular expression” and “wrap around” ticked. Click once on “replace all”, then check the resulting data. You can see the important stuff is outlined in red text with a shadow background. This text can be copied directly using your keyboard or mouse, so no need to type each character individually.

    Let us know how you got on. If any further issues then you may need to elaborate on the data used, or at least provide some of the examples where the regex did NOT work, so we can amend it to suit.

    Finally remove that last blank line.

    Terry



  • @neagal

    The following regex, whilst not wrong has “ms” included which isn’t necessary. I had that in originally as I believed your data was of the type “180 ms”. When you confirmed the space in the middle was not actually there I did not remove it efficiently.

    ^([^ ]+?) ([^ ]+?) ([^ ]+ms) ([^ ]+?) ([^ ]+?) ([^ ]+?) ([^ ]+?)(\R)

    You will notice all other instances are ([^ ]+?), the one with “ms” could actually be the same. I just did not notice it when posting. I thought it would be best to advertise that fact now so you don’t get the wrong impression of how the regex works and be confused over why I had the “ms” in that instance, yet no other characters in other instances.

    Terry



  • Too many questions…too much obscurity…not only do you have to solve the OP’s problem…you have to straighten the OP out to get a coherent problem presented…good that you have a lot of endurance, @Terry-R !



  • Hi, @neagal, @Terry-R, @scott-sumner and All,

    • About a possible space char, before the ms string, you can delete it, to get a coherent field, with the regex S/R :

      • SEARCH (?-i)(?<=\d)\x20(?=ms)

      • REPLACE Leave EMPTY

    • Now, if we use lazy quantifiers, the Terry’s regex S/R, with a line-break between each block, turns to :

      • SEARCH (?-s)^(.+?\x20)(.+?)\x20(.+?)\x20(.+?)\x20(.+?)\x20(.+?)\x20(.+)(\R)

      • REPLACE \1\2\8\1\3\8\1\4\8\1\5\8\1\6\8\1\7\8\8

    • Eventually, if necessary, you can move back to the #### ms syntax, with a space between, with the regex S/R :

      • SEARCH ms$

      • REPLACE \x20$0

    Best Regards,

    guy038


Log in to reply