help please, extract columns
-
Hi all, I have a big text file and the data in it are in 7 columns, but how can I do to have two in two columns.
( 1 2 3 4 5 6 7 columns)
“name”:"1 80.97.120.40 180 ms [n/a] 21 “type”:“A” @post
“asn”:6876 90.97.117.177 134 ms [n/a] 20 “type”:“B” @post
“asn”:23650 70.97.120.10 75 ms [n/a] 21 “type”:“A” @contact
“ttl”:60 70.97.122.105 87 ms geodnsd.global.sonicwall.com 20 “type”:“C” @contact
“asn”:15399 70.97.122.106 89 ms geodnsd.global.sonicwall.com 19 “type”:“A” @contact
“ttl”:7896 70.97.122.107 95 ms www.psmythe.net 21 “type”:“B” @post
“asn”:456 70.97.122.104 98 ms wasabi.prostructure.com 22 “type”:“C” @contact
“name”:"376 70.97.122.108 89 ms server_ip 21 “type”:“B” @contact
“asn”:15343 70.97.122.109 87 ms xcder.com 21 “type”:“A” @postI need results like this example
1 2
1 3
1 4
1 5
1 6
1 72 1
2 3
2 4
2 5
2 6
2 7and so on with the others
the columns are separated, How could I do with regular expression mode?
thanks
-
If the columns are space-separated, I see EIGHT columns, not SEVEN.
-
-
@neagal
@Scott-Sumner said it’s a bit confusing, even more so now as it would appear you have provided different data to the original. So is the 180ms field definitely “180ms” or is it as you originally typed “180 ms”, where there is a space in the middle.I’m also a bit unsure of what you intend to happen as the 1 2 to 1 7 differs from the 2 1 to 2 7.
If we consider each line separately I think what you want is a line with the 1st and 2nd field separated by a space, followed by a line with 1st and 3rd columns, through to the 1st and 7th column. This would be repeated for every original line of data, is this correct?So using your original data I produced this result:
“name”:"1 80.97.120.40 “name”:"1 180 ms “name”:"1 [n/a] “name”:"1 21 “name”:"1 “type”:“A” “name”:"1 @post “asn”:6876 90.97.117.177 “asn”:6876 134 ms “asn”:6876 [n/a] “asn”:6876 20 “asn”:6876 “type”:“B” “asn”:6876 @post “asn”:23650 70.97.120.10 “asn”:23650 75 ms “asn”:23650 [n/a] “asn”:23650 21 “asn”:23650 “type”:“A” “asn”:23650 @contact “ttl”:60 70.97.122.105 “ttl”:60 87 ms “ttl”:60 geodnsd.global.sonicwall.com “ttl”:60 20 “ttl”:60 “type”:“C” “ttl”:60 @contact “asn”:15399 70.97.122.106 “asn”:15399 89 ms “asn”:15399 geodnsd.global.sonicwall.com “asn”:15399 19 “asn”:15399 “type”:“A” “asn”:15399 @contact “ttl”:7896 70.97.122.107 “ttl”:7896 95 ms “ttl”:7896 www.psmythe.net “ttl”:7896 21 “ttl”:7896 “type”:“B” “ttl”:7896 @post “asn”:456 70.97.122.104 “asn”:456 98 ms “asn”:456 wasabi.prostructure.com “asn”:456 22 “asn”:456 “type”:“C” “asn”:456 @contact “name”:"376 70.97.122.108 “name”:"376 89 ms “name”:"376 server_ip “name”:"376 21 “name”:"376 “type”:“B” “name”:"376 @contact “asn”:15343 70.97.122.109 “asn”:15343 87 ms “asn”:15343 xcder.com “asn”:15343 21 “asn”:15343 “type”:“A” “asn”:15343 @post
Terry
-
@Terry-R said:
If we consider each line separately I think what you want is a line with the 1st and 2nd field separated by a space, followed by a line with 1st and 3rd columns, through to the 1st and 7th column. This would be repeated for every original line of data, is this correct?
it is right
-
@neagal
There’s still 1 unanswered question, is the 180ms field as you originally typed it or as you typed it in response to @Scott-Sumner question?
So is it:180 ms or 180ms
Terry
-
-
@neagal
Before we start on this we need to add a final blank line to the original data, as my regex (regular expression) expects one at the end of every line.So first we use the Replace function with
Find What:\z
Replace With:\r\n
For all the regex’s provided we need the search mode set to “regular expression” and “wrap around” ticked. You can hit the replace button, or replace all button, it will only find 1 occurrence and add a blank line at the end of the data. You may want to confirm that by checking the bottom of Notepad++ where it shows a count of lines, see if it increases by 1 after running the first regex.
So the Replace function to alter your data is.
Find What:^([^ ]+?) ([^ ]+?) ([^ ]+ms) ([^ ]+?) ([^ ]+?) ([^ ]+?) ([^ ]+?)(\R)
For the Replace field I also have a question, do you want a blank line between each set of fields. So all 6 lines of the original line 1 followed by a blank line, then 6 lines of original line 2, repeated?
If blank line between sets then
Replace With:\1 \2\8\1 \3\8\1 \4\8\1 \5\8\1 \6\8\1 \7\8\8
If no blank line between then
Replace With:\1 \2\8\1 \3\8\1 \4\8\1 \5\8\1 \6\8\1 \7\8
As before, set search mode to “regular expression” and “wrap around” ticked. Click once on “replace all”, then check the resulting data. You can see the important stuff is outlined in red text with a shadow background. This text can be copied directly using your keyboard or mouse, so no need to type each character individually.Let us know how you got on. If any further issues then you may need to elaborate on the data used, or at least provide some of the examples where the regex did NOT work, so we can amend it to suit.
Finally remove that last blank line.
Terry
-
The following regex, whilst not wrong has “ms” included which isn’t necessary. I had that in originally as I believed your data was of the type “180 ms”. When you confirmed the space in the middle was not actually there I did not remove it efficiently.
^([^ ]+?) ([^ ]+?) ([^ ]+ms) ([^ ]+?) ([^ ]+?) ([^ ]+?) ([^ ]+?)(\R)
You will notice all other instances are
([^ ]+?)
, the one with “ms” could actually be the same. I just did not notice it when posting. I thought it would be best to advertise that fact now so you don’t get the wrong impression of how the regex works and be confused over why I had the “ms” in that instance, yet no other characters in other instances.Terry
-
Too many questions…too much obscurity…not only do you have to solve the OP’s problem…you have to straighten the OP out to get a coherent problem presented…good that you have a lot of endurance, @Terry-R !
-
Hi, @neagal, @Terry-R, @scott-sumner and All,
-
About a possible space char, before the
ms
string, you can delete it, to get a coherent field, with the regex S/R :-
SEARCH
(?-i)(?<=\d)\x20(?=ms)
-
REPLACE
Leave EMPTY
-
-
Now, if we use lazy quantifiers, the Terry’s regex S/R, with a line-break between each block, turns to :
-
SEARCH
(?-s)^(.+?\x20)(.+?)\x20(.+?)\x20(.+?)\x20(.+?)\x20(.+?)\x20(.+)(\R)
-
REPLACE
\1\2\8\1\3\8\1\4\8\1\5\8\1\6\8\1\7\8\8
-
-
Eventually, if necessary, you can move back to the
#### ms
syntax, with a space between, with the regex S/R :-
SEARCH
ms$
-
REPLACE
\x20$0
-
Best Regards,
guy038
-