remove all text except hex characters



  • Hi,
    How do I remove all text except hex characters and then list all hex characters in one row?

    For example, below is my original file and what I would like to convert the file into:

    <Start original file>
    ;-----------------------------------------------------------------------------
    ; IP in IP Packets for OMX3200
    ;-----------------------------------------------------------------------------
    ; Rev 1.00: May 10, 2018
    : - Initial creation
    ;
    ; MAC Addresses are NetQuest OUI plus 16-bit PDU number plus 0x44 “D” for
    ; destination or 0x52 “S” for source
    ;-----------------------------------------------------------------------------
    ; PDU:01
    ; Eth.ipv4
    ;-----------------------------------------------------------------------------
    00 20 1E 00 01 44 – MAC destination address
    00 20 1E 00 01 52 – MAC source address
    08 00 – Ethertype:IPv4
    – IPv4 Header [20 bytes]
    45 – Version/Internet Header Length (IHL)
    00 – Type Of Service (TOS)
    00 42 – Total length (bytes)
    55 44 – Identifier
    00 – Flags
    00 – Fragment Offset
    80 – Time To Live (TTL)
    11 – IP Protocol: (UDP)
    10 85 – Header Checksum
    c8 c8 c8 eb – Source IP Address
    41 6a 01 c4 – Destination IP Address
    – UDP Header [8 bytes]
    04 9f – Source port
    00 35 – Destination port
    00 2e – Length
    05 d5 – Checksum
    – Payload [38 bytes]
    37 01 01 00 00 01 00 00
    00 00 00 00 03 77 77 77
    0c 6e 65 74 71 75 65 73
    74 63 6f 72 70 03 63 6f
    6d 00 00 01 00 01
    <Endoriginal file>

    Here are the hex characters I want to extract into a single line:

    <Start converted file>
    45
    00
    00
    42
    55
    44
    00
    00
    80
    11
    10
    85
    c8
    c8
    c8
    eb
    41
    6a
    01
    c4
    04
    9f
    00
    35
    00
    2e
    37
    01
    01
    00
    00
    01
    00
    00
    00
    00
    00
    00
    03
    77
    77
    77
    0c
    6e
    65
    74
    71
    75
    65
    73
    74
    63
    6f
    72
    70
    03
    63
    6f
    6d
    00
    00
    01
    00
    01
    <End converted file>



  • So, your description doesn’t match your data: your converted file didn’t include the hex from the MAC addresses or IPv4 port.

    Also, the “20 bytes” and “38 bytes” will be indistinguishable from hex values, unless the is indicative of something we can ignore. I am guessing that the semicolon ; indicates a full-line comment (oh, there’s one with a colon : at the start of the line). Does the dash indicate an end-of-line comment, or is it something meaningful that you just want to strip out.

    Also, because you didn’t quote your text in a way that will prevent Markdown from formatting it (see here for some markdown examples), I cannot know whether the is really that unicode character, or whether the forum just auto-converted your double-hyphen -- into a single dash .

    So, we cannot know which you really want. Though I can make some guesses.

    There’s probably magic that would do it all in one, but I’d personally prefer a multi-step process, so you (and I) can see what’s going on.

    All of these require selecting Regular Expression in the Find/Replace dialog:

    1. Eliminate ; and : lines:
      • Find What: (?-s)^\s*[;:].*$ – any line starting with any amount of space (including no space), then a semicolon or colon
      • Replace With: (empty)
    2. Eliminate from the dash or double-hyphen -- to the end of the line
      • Find What: (?-s)(–|--).*$ – any thing from the horizontal bar to the end of the line
      • Replace With: (empty)

    At this point, all the ignore-these matches (either in apparent comments or apparent decimal after dashes) should be gone. It should just be pairs of hex nibbles at this point:

    1. move each pair to its own line
      • Find: (?is)([0-9a-z]{2})\s+ – any group of 2 case-insensitive hexadecimal digits, followed by one or more space or newline characters
      • Replace: $1\r\n – replace with the group, plus a standard windows EOL (CRLF) sequence
    2. if you also want to get ride of extra newlines:
      • Find: \R+ – find any sequence of one or more newlines (whether CRLF, LF, or CR)
      • Replace: \r\n – and replace it with a single standard windows EOL sequence

    This is based on my best guess of your intention.

    PS: this forum is made of volunteers helping fellow NPP users of their own volition, and usually not as part of their paid job. This is not a code writing service that does your job or your homework for you for free, while you get the paycheck or course credit. It would have been better if you’d shown what you tried, why you thought it would do it, and where it fell short, and then asked us for help in fixing it. I answered out of the “if reasonably asked, I’ll give a freebie” mindset. But if this doesn’t work for you, you’ll have to show some effort. For example, you can follow the links here to find more regex documentation, which will come in handy for understanding or customizing what I’ve written.



  • @PeterJones

    I looked at it…but with the problems with the definition (that you correctly pointed out), I decided to pass on it. :-)

    I’ve stopped doing too much “guessing”.



  • Yes, but I’ve got to keep answering these guesswork ones, otherwise @Terry-R will keep answering and getting the reputation instead of me… and after only 2 months on the forum, Terry is already in the top-10 reputation users. I’ve got to keep my #4 ranking, after all. :-)

    Okay, actually, I’m happy when more users give many high-quality answers.

    I try to answer guesswork questions when I think what I share will be a useful starting point, even if it doesn’t solve the question the OP intended to ask (and I hope that the OP is able to clarify with just one additional post which shows effort and clarity, which will bring about an improved solution in one iteration; hope springs eternal, I know).



  • @PeterJones @Terry-R … two guys with more reputation points than postings…keep up the high-quality work!



  • Sorry @PeterJones and others, I didn’t mean to steal your thunder. I hadn’t been looking at the stats, but I’m a bit worried at Scott Sumner as he went click happy at my recent postings. My little computer had difficulty in notifying me with dings of all the upvotes. Together with his recent post, I’m wondering if he needs a holiday?

    Seriously though, I do get ‘warm fuzzies’ helping out, and of course also pushing my own boundaries. Part of that is thanks to you lot, guiding me where I’ve been a bit too literal, pulling me back in line when I’ve done something stupid. I take nothing personally, as I hope you don’t if I happen to say something slightly askew.

    Cheers everybody
    Terry

    PS What does OP mean?! you lot are using it, I’ve now started using it but don’t know what the chars stand for, it bugs me!



  • @Terry-R

    OP = Original Poster



  • AAARRRRGGGHHHH! I thought that might have been it but it seemed a bit too obvious.

    Terry



  • It can alternately mean Original Post, as well. You’ve got to take it on context of whether it’s talking about the content or the content creator. :-)


Log in to reply