Replace everything outside and including quotation marks coupled with a string



  • It’s a bit hard to explain so I’ll give an example

    {“gilded”:0,“retrieved_on”:1473821517,“distinguished”:null,“author_flair_text”:null,“author”:“dayuii”,“parent_id”:“t3_22510”,“edited”:false,“id”:“c2727”,“subreddit”:“reddit.com”,“author_flair_css_class”:null,“created_utc”:1136090694,“score”:1,“ups”:1,“controversiality”:0,“body”:“ok”,“link_id”:“t3_22510”,“stickied”:false,“subreddit_id”:“t5_6”}

    I want to retrieve the string ok which is found in “body”:“ok”. How would I delete everything else in order to get this?



  • First option: use a JSON parser or an appropriate module or library in your favorite programming language
    Second option: use a regular expression. Note, this solution will be highly specific to the data.

    You gave only one example, and no counter examples, so I have no way of testing whether this matches all your circumstances correctly, and whether it will correctly leave other lines un-edited. But assuming you want the value of the “body” element, and assuming all those quotes are straight ASCII quotes " and thus valid JSON (and not curly quotes “ ”, which the forum may have kindly changed your straight quotes into), and assuming there are no embedded newlines in the value, then I successfully tried this:

    • find = (?-s).*"body":"([^"]+)".*
    • replace = $1
    • regular expression

    There are other expressions that would also match your description, but this is the first I came up with that did what you said you wanted.

    Note, this will not work if there are embedded straight-quotes in the value of the body element. If there are, you will need a true JSON parser, because while you can craft specific regexes if you know the maximum level of embedded quotes, and/or know the escaping mechanism, if you get more complicated than this, it’s really best to use a dedicated JSON-parsing library.

    FYI: if you have further regex needs, study this FAQ and the documentation it points to. Before asking a new regex question, understand that many of us will expect you to show what data you have (exactly), what data you want (exactly), what regex you already tried (to show that you’re showing effort), why you thought that regex would work (to prove it wasn’t just something randomly typed), and what data you’re getting with an explanation of why that result is wrong. When you show that effort, you’ll see us bend over backword to get things working for you. If you need help formatting the data so that the forum doesn’t mangle it (so that it shows “exactly”, as I said earlier), see this help-with-markdown post, where @Scott-Sumner gives a great summary of how to use Markdown for this forum’s needs.


Log in to reply