Duplicate rows according to whats in a column?



  • Hi all,

    I have a list of CIDR networks. On each line, it includes a list of the subnets that are included. This is in a CSV. for example:

    network, datacenter,environment,1,2,3,4,5,6,7,8
    10.16.64.0/24,ABC,Prod,10.16.64.0,10.16.65.0,10.16.66.0,10.16.67.0,

    There are about 800 lines like this.

    What i need to end up with, is one line for each of the subnets. Those are in the last 8 columns. Like this
    10.16.64.0/24,ABC,Prod,10.16.64.0
    10.16.64.0/24,ABC,Prod,10.16.65.0
    10.16.64.0/24,ABC,Prod,10.16.66.0
    10.16.64.0/24,ABC,Prod,10.16.67.0

    Or if easier, just
    ABC,Prod,10.16.64.0
    ABC,Prod,10.16.65.0
    ABC,Prod,10.16.66.0
    ABC,Prod,10.16.67.0

    This is going to be used as a lookup table to match any IP in the network with the first 3 octets in this list and return the data center and environment.

    Since this is going into a vendor tool, I don’t have the luxury of using a function /script/program to return the values programmatically. I need a lookup list.

    Any tips on how I might accomplish this with NPP? It’s odd I know :)

    Thanks!



  • Hello Greg,

    Of course, a Python, Lua , NppExec or Gawk script would be the best to achieve that task. However, with two S/R, we could get it right !

    From your example, with the first header line, below :

    network, datacenter,environment,1,2,3,4,5,6,7,8

    it seems that each line, of your file, may have, after the Environment field, between 1 and 8 addresses IPV4 !

    So, I consider the example text below :

    10.16.64.0/24,ABC,Prod,10.16.64.0,
    10.16.64.0/24,ABC,Prod,10.16.64.0,10.16.65.0,
    10.16.64.0/24,ABC,Prod,10.16.64.0,10.16.65.0,10.16.66.0,
    10.16.64.0/24,ABC,Prod,10.16.64.0,10.16.65.0,10.16.66.0,10.16.67.0,
    10.16.64.0/24,ABC,Prod,10.16.64.0,10.16.65.0,10.16.66.0,10.16.67.0,10.16.68.0,
    10.16.64.0/24,ABC,Prod,10.16.64.0,10.16.65.0,10.16.66.0,10.16.67.0,10.16.68.0,10.16.69.0,
    10.16.64.0/24,ABC,Prod,10.16.64.0,10.16.65.0,10.16.66.0,10.16.67.0,10.16.68.0,10.16.69.0,10.16.70.0,
    10.16.64.0/24,ABC,Prod,10.16.64.0,10.16.65.0,10.16.66.0,10.16.67.0,10.16.68.0,10.16.69.0,10.16.70.0,10.16.71.0,
    

    If I suppose that the fields 2 and 3 ( datacenter and environment ) begin, always, with a letter, then the following S/R, performed ONCE, only, in Regular expression mode, by clicking on the Replace All button

    SEARCH = (?<=\d),(?=\d)|(,\R)

    REPLACE = \r\n(?1\r\n)

    would change the text, as below :

    10.16.64.0/24,ABC,Prod,10.16.64.0
    
    10.16.64.0/24,ABC,Prod,10.16.64.0
    10.16.65.0
    
    10.16.64.0/24,ABC,Prod,10.16.64.0
    10.16.65.0
    10.16.66.0
    
    10.16.64.0/24,ABC,Prod,10.16.64.0
    10.16.65.0
    10.16.66.0
    10.16.67.0
    
    10.16.64.0/24,ABC,Prod,10.16.64.0
    10.16.65.0
    10.16.66.0
    10.16.67.0
    10.16.68.0
    
    10.16.64.0/24,ABC,Prod,10.16.64.0
    10.16.65.0
    10.16.66.0
    10.16.67.0
    10.16.68.0
    10.16.69.0
    
    10.16.64.0/24,ABC,Prod,10.16.64.0
    10.16.65.0
    10.16.66.0
    10.16.67.0
    10.16.68.0
    10.16.69.0
    10.16.70.0
    
    10.16.64.0/24,ABC,Prod,10.16.64.0
    10.16.65.0
    10.16.66.0
    10.16.67.0
    10.16.68.0
    10.16.69.0
    10.16.70.0
    10.16.71.0
    

    Then, with this second S/R, below, you should get your text as you like to !

    SEARCH = (?-s)(.+?,.+?,.+?,).+\R(?=[^,\r\n]+\R)

    REPLACE = $0\1

    IMPORTANT :

    You’ll have to perform this second S/R, MANY times, till no more occurrence can be found ! That is to say that you’ll just have to click on the Replace All button, repeatedly, at least eight times, as you may have up to 8 addresses, in your file !

    So, after eight consecutive replacements, we obtain the final text below :

    10.16.64.0/24,ABC,Prod,10.16.64.0
    
    10.16.64.0/24,ABC,Prod,10.16.64.0
    10.16.64.0/24,ABC,Prod,10.16.65.0
    
    10.16.64.0/24,ABC,Prod,10.16.64.0
    10.16.64.0/24,ABC,Prod,10.16.65.0
    10.16.64.0/24,ABC,Prod,10.16.66.0
    
    10.16.64.0/24,ABC,Prod,10.16.64.0
    10.16.64.0/24,ABC,Prod,10.16.65.0
    10.16.64.0/24,ABC,Prod,10.16.66.0
    10.16.64.0/24,ABC,Prod,10.16.67.0
    
    10.16.64.0/24,ABC,Prod,10.16.64.0
    10.16.64.0/24,ABC,Prod,10.16.65.0
    10.16.64.0/24,ABC,Prod,10.16.66.0
    10.16.64.0/24,ABC,Prod,10.16.67.0
    10.16.64.0/24,ABC,Prod,10.16.68.0
    
    10.16.64.0/24,ABC,Prod,10.16.64.0
    10.16.64.0/24,ABC,Prod,10.16.65.0
    10.16.64.0/24,ABC,Prod,10.16.66.0
    10.16.64.0/24,ABC,Prod,10.16.67.0
    10.16.64.0/24,ABC,Prod,10.16.68.0
    10.16.64.0/24,ABC,Prod,10.16.69.0
    
    10.16.64.0/24,ABC,Prod,10.16.64.0
    10.16.64.0/24,ABC,Prod,10.16.65.0
    10.16.64.0/24,ABC,Prod,10.16.66.0
    10.16.64.0/24,ABC,Prod,10.16.67.0
    10.16.64.0/24,ABC,Prod,10.16.68.0
    10.16.64.0/24,ABC,Prod,10.16.69.0
    10.16.64.0/24,ABC,Prod,10.16.70.0
    
    10.16.64.0/24,ABC,Prod,10.16.64.0
    10.16.64.0/24,ABC,Prod,10.16.65.0
    10.16.64.0/24,ABC,Prod,10.16.66.0
    10.16.64.0/24,ABC,Prod,10.16.67.0
    10.16.64.0/24,ABC,Prod,10.16.68.0
    10.16.64.0/24,ABC,Prod,10.16.69.0
    10.16.64.0/24,ABC,Prod,10.16.70.0
    10.16.64.0/24,ABC,Prod,10.16.71.0
    

    Best Regards,

    But, presently, I must go to bed ! So, just be patient, till tomorrow, for some explanations, about these regexes !

    Best Regards,

    guy038



  • What about using a find and replace with a regex? (I’m not a regex expert, but this appeared to work for me. )
    Simple example. (In the replace box the “Search Mode” is “Regular expression”)
    Set the “Find what:” to:
    ^(.?,.?,.?),(.?),(.?),(.?),(.?),(.?),(.?),(.?),(.*?)$
    Set the “Replace with :” to:
    \1,\2\r\n\1,\3\r\n\1,\4\r\n\1,\5\r\n\1,\6\r\n\1,\7\r\n\1,\8\r\n\1,\9

    Then press the “Replace” or “Replace All” button.

    Hope this helps.



  • Hi Greg and Joe,

    Joe, concerning the SEARCH regex, I suppose that you meant, ( with some more star meta-characters ! ) :

    SEARCH = ^(.*?,.*?,.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),$

    Then, with :

    REPLACE = \1,\2\r\n\1,\3\r\n\1,\4\r\n\1,\5\r\n\1,\6\r\n\1,\7\r\n\1,\8\r\n\1,\9

    it would, indeed, match the line, with 8 addresses, below

    10.16.64.0/24,ABC,Prod,10.16.64.0,10.16.65.0,10.16.66.0,10.16.67.0,10.16.68.0,10.16.69.0,10.16.70.0,10.16.71.0,
    

    and changed it into the block of 8 lines, below :

    10.16.64.0/24,ABC,Prod,10.16.64.0
    10.16.64.0/24,ABC,Prod,10.16.65.0
    10.16.64.0/24,ABC,Prod,10.16.66.0
    10.16.64.0/24,ABC,Prod,10.16.67.0
    10.16.64.0/24,ABC,Prod,10.16.68.0
    10.16.64.0/24,ABC,Prod,10.16.69.0
    10.16.64.0/24,ABC,Prod,10.16.70.0
    10.16.64.0/24,ABC,Prod,10.16.71.0
    

    Unfortunately, if a line, of Greg’s text, has less than eight addresses, at the end, your regex would not work :-((

    But, thanks to you, you put me in the good direction and gave me an idea for getting all the work done, with ONE regex ONLY !!

    So, I start with the example text, below, which contains all possible cases, from no address to 8 addresses, after the environment field, of each line :

    10.16.64.0/24,ABC,Prod,
    10.16.64.0/24,ABC,Prod,10.16.64.0,
    10.16.64.0/24,ABC,Prod,10.16.64.0,10.16.65.0,
    10.16.64.0/24,ABC,Prod,10.16.64.0,10.16.65.0,10.16.66.0,
    10.16.64.0/24,ABC,Prod,10.16.64.0,10.16.65.0,10.16.66.0,10.16.67.0,
    10.16.64.0/24,ABC,Prod,10.16.64.0,10.16.65.0,10.16.66.0,10.16.67.0,10.16.68.0,
    10.16.64.0/24,ABC,Prod,10.16.64.0,10.16.65.0,10.16.66.0,10.16.67.0,10.16.68.0,10.16.69.0,
    10.16.64.0/24,ABC,Prod,10.16.64.0,10.16.65.0,10.16.66.0,10.16.67.0,10.16.68.0,10.16.69.0,10.16.70.0,
    10.16.64.0/24,ABC,Prod,10.16.64.0,10.16.65.0,10.16.66.0,10.16.67.0,10.16.68.0,10.16.69.0,10.16.70.0,10.16.71.0,
    

    The following S/R, with search mode = Regular expression, will realize everything, after clicking, ONCE only, on the Replace All button !!

    SEARCH = (.*?,.*?,.*?,)(?:(.*?),)?(?:(.*?),)?(?:(.*?),)?(?:(.*?),)?(?:(.*?),)?(?:(.*?),)?(?:(.*?),)?(?:(.*?),)?

    REPLACE = \1(?2\2)\r\n(?3\1\3\r\n)(?4\1\4\r\n)(?5\1\5\r\n)(?6\1\6\r\n)(?7\1\7\r\n)(?8\1\8\r\n)(?9\1\9\r\n)

    So, in one go, the resulting text would be :

    10.16.64.0/24,ABC,Prod,
    
    10.16.64.0/24,ABC,Prod,10.16.64.0
    
    10.16.64.0/24,ABC,Prod,10.16.64.0
    10.16.64.0/24,ABC,Prod,10.16.65.0
    
    10.16.64.0/24,ABC,Prod,10.16.64.0
    10.16.64.0/24,ABC,Prod,10.16.65.0
    10.16.64.0/24,ABC,Prod,10.16.66.0
    
    10.16.64.0/24,ABC,Prod,10.16.64.0
    10.16.64.0/24,ABC,Prod,10.16.65.0
    10.16.64.0/24,ABC,Prod,10.16.66.0
    10.16.64.0/24,ABC,Prod,10.16.67.0
    
    10.16.64.0/24,ABC,Prod,10.16.64.0
    10.16.64.0/24,ABC,Prod,10.16.65.0
    10.16.64.0/24,ABC,Prod,10.16.66.0
    10.16.64.0/24,ABC,Prod,10.16.67.0
    10.16.64.0/24,ABC,Prod,10.16.68.0
    
    10.16.64.0/24,ABC,Prod,10.16.64.0
    10.16.64.0/24,ABC,Prod,10.16.65.0
    10.16.64.0/24,ABC,Prod,10.16.66.0
    10.16.64.0/24,ABC,Prod,10.16.67.0
    10.16.64.0/24,ABC,Prod,10.16.68.0
    10.16.64.0/24,ABC,Prod,10.16.69.0
    
    10.16.64.0/24,ABC,Prod,10.16.64.0
    10.16.64.0/24,ABC,Prod,10.16.65.0
    10.16.64.0/24,ABC,Prod,10.16.66.0
    10.16.64.0/24,ABC,Prod,10.16.67.0
    10.16.64.0/24,ABC,Prod,10.16.68.0
    10.16.64.0/24,ABC,Prod,10.16.69.0
    10.16.64.0/24,ABC,Prod,10.16.70.0
    
    10.16.64.0/24,ABC,Prod,10.16.64.0
    10.16.64.0/24,ABC,Prod,10.16.65.0
    10.16.64.0/24,ABC,Prod,10.16.66.0
    10.16.64.0/24,ABC,Prod,10.16.67.0
    10.16.64.0/24,ABC,Prod,10.16.68.0
    10.16.64.0/24,ABC,Prod,10.16.69.0
    10.16.64.0/24,ABC,Prod,10.16.70.0
    10.16.64.0/24,ABC,Prod,10.16.71.0
    

    Et voilà ! Well, although I practised regexes, since many years, I, always, surprised by their incredible power !!


    Notes :

    • If the original line is only 10.16.64.0/24,ABC,Prod,, without any address, after, the S/R just rewrites that line

    • In the search regex, the 8 parts (.*?) are groups from 2 to 9, each containing a possible address, which must be rewritten, in replacement

    • These 8 groups, followed by a comma, are embedded in an optional non-capturing group (?:(.*?),)? ( the group may be present [1] or absent [0] )

    • At the beginning, the part (.*?,.*?,.*?,) looks for the first three fields : network, datacenter and environment, followed by a comma

    • In replacement, the syntax (?#\1\#\r\n) is called a conditional replacement. It means that IF the group # exists, then it rewrites the group 1, followed by the group #, followed by the Windows EOL characters \r\n

    • At the beginning, the part \1(?2\2)\r\n writes, first, the group 1. Then it writes the group 2, ONLY IF it exists ( the first IPV4 address ), and, finally rewrites the Windows EOL characters \r\n

    Remarks :

    • Contrary to my previous post, with this new S/R, there is no condition, any more, about the contents of fields 2 and 3 ! They could be, for instance : 10.16.64.0/24,99999,00000000,… :-)

    • Any possible leading indentation, of a line, is just kept, by this S/R !

    • You can click, either, on the Replace All button AND/OR on the Replace button, for a step by step replacement :-)

    Cheers,

    guy038


Log in to reply