a newbie question about search

guy038

Hello, @roni-segoly, @mark-olson, @alan-kilborn and All,

@roni-segoly, you did not provide enough text to guess which should be the right way to help you !

Do you mean that, from this INPUT text :

},
"display_text_range" : [
"0",
"24"
],
"favorite_count" : "1",
"in_reply_to_status_id_str" : "1846199292232339627",
"id_str" : "1846219638109086002",
"in_reply_to_user_id" : "4774540948",
"truncated" : false,
"retweet_count" : "0",
"id" : "1846219638109086002",
"in_reply_to_status_id" : "1846199292232339627",
"created_at" : "Tue Oct 15 16:00:37 +0000 2024",
"favorited" : false,
"full_text" : "@zamir_shatz יש גם רמב"פ",
"lang" : "iw",
"in_reply_to_screen_name" : "zamir_shatz",
"in_reply_to_user_id_str" : "4774540948"

You are expecting this OUTPUT text, with the two lines, beginning with created_at or full_text, moved after the others ones ?

},
"display_text_range" : [
"0",
"24"
],
"favorite_count" : "1",
"in_reply_to_status_id_str" : "1846199292232339627",
"id_str" : "1846219638109086002",
"in_reply_to_user_id" : "4774540948",
"truncated" : false,
"retweet_count" : "0",
"id" : "1846219638109086002",
"in_reply_to_status_id" : "1846199292232339627",
"favorited" : false,
"lang" : "iw",
"in_reply_to_screen_name" : "zamir_shatz",
"in_reply_to_user_id_str" : "4774540948"
"created_at" : "Tue Oct 15 16:00:37 +0000 2024",
"full_text" : "@zamir_shatz יש גם רמב"פ",

Best Regards,

guy038

Roni Segoly

@Alan-Kilborn Managed, cheers

Roni Segoly

@Alan-Kilborn I did print screen of section of the file as not everyone has Hebrew characters
I need if possible to be without the labels and date and text in one line, separated by comma
Like
“Tue Oct 15 16:00:37 +0000 2024”, “@zamir_shatz יש גם רמב"פ”

I cannot post the link to the file yet, need two reputations

Mark Olson

@Roni-Segoly
In the future, you should refer to JSON as JSON or json, not js. Calling it js is confusing to programmers like me, because js is generally used to refer to JavaScript, not JSON.

JsonTools makes it easy to extract a few fields (like full_text and created_at) from each object in an array of objects, which is what your tweets appear to be.

Open the JsonTools tree view for your file.
In the text box in the upper left-hand corner of the tree view, enter the query @[:][created_at, full_text]. This RemesPath query will iterate through the array of objects and extract the created_at and full_text fields from each object.
Click the Submit query button.
You can now look at the tree view and notice that the tree displays only the full_text and created_at field in each object.
Click the Save query result button.
The fields you wanted will now be in a new buffer, which you can save to a new file if desired.

JsonTools has a lot of other features, like a sort form that can sort JSON arrays in a variety of different ways. I recommend reading the documentation; I put a lot of work into making it readable and thorough.

EDIT: Don’t post a link to the file. If it’s really large, it will waste the resources of this forum. I know what tweet JSON looks like; I have a bunch of it on my own computer that I use as examples to test JsonTools.

EDIT2: If you don’t know what I mean by “array” and “object”, you should read this introduction to JSON. It is a bad idea to work with JSON without understanding it.

Alan Kilborn

@Roni-Segoly:

not everyone has Hebrew characters

They don’t?

guy038

Hi, @roni-segoly, @mark-olson, @alan-kilborn and All,

Very easy with regexes !

So :

Move to your file tab, first
Open the Replace dialog ( Ctrl + H )
Untick all box options
SEARCH (?-is)^"created_at"\x20:\x20|\R"full_text"\x20:(.+),$
REPLACE ?1\1
Check the Wrap around option
Select the Regular expression search mode
Click, once only, on the Replace All button

Voila !

BR

guy038

Mark Olson

@guy038 said in a newbie question about search:

Very easy with regexes !

Not in general.

Alan Kilborn

@Mark-Olson said in a newbie question about search:

Not in general.

Well, I guess it depends.
Totally generally, then I agree with you.

If the data is simple and well formed, it’s doable.
That was the presumption I was proceeding upon with my first answer to OP.

But, everyone encouraged OP to say more…to little avail.
So, if the data was NOT simple and well formed, OP likely did not get what he wanted, at least from my method.

But OP seemed satisfied, so, let’s move on…

Mark Olson

@Alan-Kilborn said in a newbie question about search:

If the data is simple and well formed, it’s doable.
That was the presumption I was proceeding upon with my first answer to OP.

Tweet JSON is extremely complex and deeply nested, with some fields appearing at different nesting depths. The created_at field, for example, appears in the root object, the retweeted_status child of the root object, and the user child of the root object. If the JSON file is printed out with no depth-based indentation, your regex has no way of differentiating between these created_at fields. The full_text field could also appear at different nesting depths.

To expand, tweet JSON can have a structure that looks a little bit like this (only much, much worse):

[
  {
    "Root1": {
      "bar": false,
      "quz": 1
    },
    "rOOt2": {
      "quz": 2,
      "bar": false
    },
    "ROOT3": [
      {
        "id": 1,
        "id_str": 2
      }
    ],
    "ROot4": [
        "id": -37,
        "id_str": 75
    ]
    "roOT5": "blah"
  }
]

If you write a regex that searches for the bar field, you most likely won’t be able to tell whether its parent is Root1 or rOOt2. A similar issue happens with the id_str and id keys.

Alan Kilborn

@Mark-Olson said:

Tweet JSON is extremely complex and deeply nested, with some fields appearing at different nesting depths. The created_at field, for example, appears in the root object, the retweeted_status child of the root object, and the user child of the root object. If the JSON file is printed out with no depth-based indentation, your regex has no way of differentiating between these created_at fields. The full_text field could also appear at different nesting depths.

and blah blah blah…

I hope that’s for the benefit of the OP and not me, because I don’t care an iota about Tweet JSON or WTF the data is. I sold my solution as simple-minded, it’s up to OP to decide if it works for him, or to keep pursuing some other solution.

Know thy data…understand how you’re manipulating it – this is OP’s responsibility. As is asking a full and complete question, with representative data fully shown.

Alen Mark

@Roni-Segoly
Yes, it’s definitely possible to extract specific lines from your large JS file in Notepad++, and there are a couple of ways you can approach this:

Using Regular Expressions (Regex): Notepad++ has a powerful “Find” feature that supports regular expressions, which can help you search for patterns in your file. If you know the structure of the lines containing the specific strings you want to extract, you can use a regex search to locate them together.

Here’s how to use Regex in Notepad++:

Press Ctrl + F to open the Find dialog.
Go to the Find tab and select Regular expression in the search mode.
Use a regex pattern to find the string you need along with its corresponding line. For example:
markdown
Copy code
(YourFirstString.*\n.*YourSecondString)
This will match the first line with YourFirstString and the line immediately after it with YourSecondString.

Using a Script Plugin: If you’re dealing with more complex extractions or specific logic, you might want to install the PythonScript or NppExec plugin, which allows you to write and execute small scripts directly in Notepad++. You can write a script that reads the file line by line, checks for the matching strings, and extracts the corresponding lines as needed.

These methods should help you extract the corresponding lines from your large JS file. If you have more details on the structure, I could help refine the search process further!

PeterJones

@Alen-Mark ,

Please note: this is the second time you’ve come to the forum and posted ultra-generic content that sounds vaguely on-topic: it is highly reminiscent of AI-generated phraseology.

Please understand that posting AI-Generated content is expressly forbidden in this forum. And if your posts continue to appear as if they are – whether or not they are – you are likely to get banned. If you wish to avoid looking like (and getting banned as) AI, then I suggest you tailor your replies to the individual posts, rather than providing overly-generic responses that don’t take into account the context of the entire conversation.