Regular expression ( remove everything but leave certain code/word )



  • hi everyone
    I exported bulk messages from some chat with “json”
    and I want to filter and remove some of them that are:
    “type”: “message”,

    {
       "id": 184160,
       "type": "message",
       "date": "2021-08-23T21:51:20",
       "from": "fifi mark",
       "from_id": "user1917774101",
       "text": "hello where are you from"
      },
      {
       "id": 184162,
       "type": "Quote",
       "date": "2021-08-23T21:51:24",
       "from": "Tommy Montana",
       "from_id": "user1911184795",
       "reply_to_message_id": 184151,
       "text": “In order to write about life first you must live it."
      },
    

    does anyone know how I can remove code from “}” to "{ "
    if “type:” is not “Quote”,
    and it is
    “type”: “message”,

    is this possible using regular expression?
    Thanks



  • more clearer: remove from “}” to "{ " if “type”: “message”, and leave if it “type”: “Quote”,



  • @Handa-Flocka ,

    Filtering JSON (or other such data-description languages) would be much easier in a purpose-built tool whose job is to process and filter that kind of data. Regex might be able to handle it, but it would likely depend greatly on the exact content of the data, and if you tried to use that same regex on similar data, there is no guarantee that it would work the next time.

    I am sure one of the regex gurus here could probably come up with something. However, your problem statement still lacks clarity. It’s often a good idea to present your data in

    what I have:

    blah blah blah
    

    what I want it to be:

    blah blahdy blah blech
    

    … in addition to your description of what you want.

    Because when I read "remove from } to {", I get confused because } is the closing brace and { is the next opening brace, by my reading… and the only thing between those is the comma; do you really want to just delete },[CRLF]{ (where the [CRLF] is a newline sequence)? Or something else?

    Even better would be if your data gave examples of JSON entries that get edited (as above)and JSON entries that don’t get edited (to help us understand what circumstances you want it to change and what circumstances you want it to stay the same)

    Maybe someone else already understands what you want. But I wouldn’t be able to try to solve it without the additional information requested.

    Also see the generic advice below; you already followed some of it, but you’ll get better answers if your search/replace questions follow all the advice.

    Good luck

    ----

    Do you want regex search/replace help? Then please be patient and polite, show some effort, and be willing to learn; answer questions and requests for clarification that are made of you. All example text should be marked as literal text using the </> toolbar button or manual Markdown syntax. To make regex in red (and so they keep their special characters like *), use backticks, like `^.*?blah.*?\z`. Screenshots can be pasted from the clipboard to your post using Ctrl+V to show graphical items, but any text should be included as literal text in your post so we can easily copy/paste your data. Show the data you have and the text you want to get from that data; include examples of things that should match and be transformed, and things that don’t match and should be left alone; show edge cases and make sure you examples are as varied as your real data. Show the regex you already tried, and why you thought it should work; tell us what’s wrong with what you do get. Read the official NPP Searching / Regex docs and the forum’s Regular Expression FAQ. If you follow these guidelines, you’re much more likely to get helpful replies that solve your problem in the shortest number of tries.



  • @PeterJones Thanks for taking time and answering

    what I meant exactly is if “type”: is “message”, and it is not “Quote” then remove whole code from it’s beginning till it’s end
    And yes I got it wrong at first it is “{” to “}”.

    if: “type”: “Quote” then it will be untouched.
    My example before

    {
       "id": 184160,
       "type": "message",
       "date": "2021-08-23T21:51:20",
       "from": "fifi mark",
       "from_id": "user1917774101",
       "text": "hello where are you from"
      },
      {
       "id": 184162,
       "type": "Quote",
       "date": "2021-08-23T21:51:24",
       "from": "Tommy Montana",
       "from_id": "user1911184795",
       "reply_to_message_id": 184151,
       "text": “In order to write about life first you must live it."
      },
    

    Result will be:

    {
     "id": 184162,
     "type": "Quote",
     "date": "2021-08-23T21:51:24",
     "from": "Tommy Montana",
     "from_id": "user1911184795",
     "reply_to_message_id": 184151,
     "text": “In order to write about life first you must live it."
    },
    

    the content here is whole chat so it will be bulk
    not just one time occurring
    also can regular expression handle it in bulk?



  • @Handa-Flocka ,

    As long as your blocks never contain nested {} (so no { "id": ####, "nested": { ... }, ... }), then the following will likely work for you:

    • FIND = (?s){(?:(?!"type"\s*:\s*"Quote")[^}])+}\s*,?
    • REPLACE = empty
    • SEARCH MODE = regular expression


  • Thanks @PeterJones
    instruction very clear and it works
    appreciated



  • This post is deleted!

Log in to reply