a newbie question about search

Roni Segoly

I never used this tool
I have huge js file which contain all my tweets from Twitter
Notepad++ opened it easily as text file
I need to extract corresponding two lines with specific strings.
I can search each one, but I need it as corresponding
Is it possible?

Mark Olson

@Roni-Segoly
You say you have a “huge js file”. If by js you mean JSON, the JsonTools plugin can help. If by js you mean JavaScript, I may not be able to help; JsonTools can handle some JavaScript objects even if they don’t comply with the original JSON specification, but it can’t handle all the complexities of JavaScript syntax.

If you provide a small example of what you are trying to do, I may be able to suggest how to solve your problem with JsonTools or some other plugin like PythonScript.

Roni Segoly

@Mark-Olson The fact it’s JS is less relevant, I can save and treat as txt
See example below
I need the lines starting with “created_at” and “full_text” one after the others, or maybe with line number and then I can sort by line numbers

I can send the whole file if needed

},
“display_text_range” : [
“0”,
“24”
],
“favorite_count” : “1”,
“in_reply_to_status_id_str” : “1846199292232339627”,
“id_str” : “1846219638109086002”,
“in_reply_to_user_id” : “4774540948”,
“truncated” : false,
“retweet_count” : “0”,
“id” : “1846219638109086002”,
“in_reply_to_status_id” : “1846199292232339627”,
“created_at” : “Tue Oct 15 16:00:37 +0000 2024”,
“favorited” : false,
“full_text” : “@zamir_shatz יש גם רמב"פ”,
“lang” : “iw”,
“in_reply_to_screen_name” : “zamir_shatz”,
“in_reply_to_user_id_str” : “4774540948”

Alan Kilborn

@Roni-Segoly

Overall, your posting is vague. There’s a FAQ about properly posting such questions.

Likely your data is actually:

},
"display_text_range" : [
"0",
"24"
],
"favorite_count" : "1",
"in_reply_to_status_id_str" : "1846199292232339627",
"id_str" : "1846219638109086002",
"in_reply_to_user_id" : "4774540948",
"truncated" : false,
"retweet_count" : "0",
"id" : "1846219638109086002",
"in_reply_to_status_id" : "1846199292232339627",
"created_at" : "Tue Oct 15 16:00:37 +0000 2024",
"favorited" : false,
"full_text" : "@zamir_shatz יש גם רמב"פ",
"lang" : "iw",
"in_reply_to_screen_name" : "zamir_shatz",
"in_reply_to_user_id_str" : "4774540948"

If I were doing your task, I might start this way:

Invoke Mark with Ctrl+m
In Find what put "created_at"|"full_text"
Checkmark: Bookmark line, Match case, Wrap around and Regular expression
Press Mark all
On the Search menu, choose Bookmark, then select Copy Bookmarked Lines
Create a new document with File > New (or simply press Ctrl+n)
Do Ctrl+v (paste)

See what that gets you for a start.

guy038

Hello, @roni-segoly, @mark-olson, @alan-kilborn and All,

@roni-segoly, you did not provide enough text to guess which should be the right way to help you !

Do you mean that, from this INPUT text :

},
"display_text_range" : [
"0",
"24"
],
"favorite_count" : "1",
"in_reply_to_status_id_str" : "1846199292232339627",
"id_str" : "1846219638109086002",
"in_reply_to_user_id" : "4774540948",
"truncated" : false,
"retweet_count" : "0",
"id" : "1846219638109086002",
"in_reply_to_status_id" : "1846199292232339627",
"created_at" : "Tue Oct 15 16:00:37 +0000 2024",
"favorited" : false,
"full_text" : "@zamir_shatz יש גם רמב"פ",
"lang" : "iw",
"in_reply_to_screen_name" : "zamir_shatz",
"in_reply_to_user_id_str" : "4774540948"

You are expecting this OUTPUT text, with the two lines, beginning with created_at or full_text, moved after the others ones ?

},
"display_text_range" : [
"0",
"24"
],
"favorite_count" : "1",
"in_reply_to_status_id_str" : "1846199292232339627",
"id_str" : "1846219638109086002",
"in_reply_to_user_id" : "4774540948",
"truncated" : false,
"retweet_count" : "0",
"id" : "1846219638109086002",
"in_reply_to_status_id" : "1846199292232339627",
"favorited" : false,
"lang" : "iw",
"in_reply_to_screen_name" : "zamir_shatz",
"in_reply_to_user_id_str" : "4774540948"
"created_at" : "Tue Oct 15 16:00:37 +0000 2024",
"full_text" : "@zamir_shatz יש גם רמב"פ",

Best Regards,

guy038

Roni Segoly

@Alan-Kilborn Managed, cheers

Roni Segoly

@Alan-Kilborn I did print screen of section of the file as not everyone has Hebrew characters
I need if possible to be without the labels and date and text in one line, separated by comma
Like
“Tue Oct 15 16:00:37 +0000 2024”, “@zamir_shatz יש גם רמב"פ”

I cannot post the link to the file yet, need two reputations

Mark Olson

@Roni-Segoly
In the future, you should refer to JSON as JSON or json, not js. Calling it js is confusing to programmers like me, because js is generally used to refer to JavaScript, not JSON.

JsonTools makes it easy to extract a few fields (like full_text and created_at) from each object in an array of objects, which is what your tweets appear to be.

Open the JsonTools tree view for your file.
In the text box in the upper left-hand corner of the tree view, enter the query @[:][created_at, full_text]. This RemesPath query will iterate through the array of objects and extract the created_at and full_text fields from each object.
Click the Submit query button.
You can now look at the tree view and notice that the tree displays only the full_text and created_at field in each object.
Click the Save query result button.
The fields you wanted will now be in a new buffer, which you can save to a new file if desired.

JsonTools has a lot of other features, like a sort form that can sort JSON arrays in a variety of different ways. I recommend reading the documentation; I put a lot of work into making it readable and thorough.

EDIT: Don’t post a link to the file. If it’s really large, it will waste the resources of this forum. I know what tweet JSON looks like; I have a bunch of it on my own computer that I use as examples to test JsonTools.

EDIT2: If you don’t know what I mean by “array” and “object”, you should read this introduction to JSON. It is a bad idea to work with JSON without understanding it.

Alan Kilborn

@Roni-Segoly:

not everyone has Hebrew characters

They don’t?

guy038

Hi, @roni-segoly, @mark-olson, @alan-kilborn and All,

Very easy with regexes !

So :

Move to your file tab, first
Open the Replace dialog ( Ctrl + H )
Untick all box options
SEARCH (?-is)^"created_at"\x20:\x20|\R"full_text"\x20:(.+),$
REPLACE ?1\1
Check the Wrap around option
Select the Regular expression search mode
Click, once only, on the Replace All button

Voila !

BR

guy038

Mark Olson

@guy038 said in a newbie question about search:

Very easy with regexes !

Not in general.

Alan Kilborn

@Mark-Olson said in a newbie question about search:

Not in general.

Well, I guess it depends.
Totally generally, then I agree with you.

If the data is simple and well formed, it’s doable.
That was the presumption I was proceeding upon with my first answer to OP.

But, everyone encouraged OP to say more…to little avail.
So, if the data was NOT simple and well formed, OP likely did not get what he wanted, at least from my method.

But OP seemed satisfied, so, let’s move on…

Mark Olson

@Alan-Kilborn said in a newbie question about search:

If the data is simple and well formed, it’s doable.
That was the presumption I was proceeding upon with my first answer to OP.

Tweet JSON is extremely complex and deeply nested, with some fields appearing at different nesting depths. The created_at field, for example, appears in the root object, the retweeted_status child of the root object, and the user child of the root object. If the JSON file is printed out with no depth-based indentation, your regex has no way of differentiating between these created_at fields. The full_text field could also appear at different nesting depths.

To expand, tweet JSON can have a structure that looks a little bit like this (only much, much worse):

[
  {
    "Root1": {
      "bar": false,
      "quz": 1
    },
    "rOOt2": {
      "quz": 2,
      "bar": false
    },
    "ROOT3": [
      {
        "id": 1,
        "id_str": 2
      }
    ],
    "ROot4": [
        "id": -37,
        "id_str": 75
    ]
    "roOT5": "blah"
  }
]

If you write a regex that searches for the bar field, you most likely won’t be able to tell whether its parent is Root1 or rOOt2. A similar issue happens with the id_str and id keys.

Alan Kilborn

@Mark-Olson said:

Tweet JSON is extremely complex and deeply nested, with some fields appearing at different nesting depths. The created_at field, for example, appears in the root object, the retweeted_status child of the root object, and the user child of the root object. If the JSON file is printed out with no depth-based indentation, your regex has no way of differentiating between these created_at fields. The full_text field could also appear at different nesting depths.

and blah blah blah…

I hope that’s for the benefit of the OP and not me, because I don’t care an iota about Tweet JSON or WTF the data is. I sold my solution as simple-minded, it’s up to OP to decide if it works for him, or to keep pursuing some other solution.

Know thy data…understand how you’re manipulating it – this is OP’s responsibility. As is asking a full and complete question, with representative data fully shown.

Alen Mark

@Roni-Segoly
Yes, it’s definitely possible to extract specific lines from your large JS file in Notepad++, and there are a couple of ways you can approach this:

Using Regular Expressions (Regex): Notepad++ has a powerful “Find” feature that supports regular expressions, which can help you search for patterns in your file. If you know the structure of the lines containing the specific strings you want to extract, you can use a regex search to locate them together.

Here’s how to use Regex in Notepad++:

Press Ctrl + F to open the Find dialog.
Go to the Find tab and select Regular expression in the search mode.
Use a regex pattern to find the string you need along with its corresponding line. For example:
markdown
Copy code
(YourFirstString.*\n.*YourSecondString)
This will match the first line with YourFirstString and the line immediately after it with YourSecondString.

Using a Script Plugin: If you’re dealing with more complex extractions or specific logic, you might want to install the PythonScript or NppExec plugin, which allows you to write and execute small scripts directly in Notepad++. You can write a script that reads the file line by line, checks for the matching strings, and extracts the corresponding lines as needed.

These methods should help you extract the corresponding lines from your large JS file. If you have more details on the structure, I could help refine the search process further!

PeterJones

@Alen-Mark ,

Please note: this is the second time you’ve come to the forum and posted ultra-generic content that sounds vaguely on-topic: it is highly reminiscent of AI-generated phraseology.

Please understand that posting AI-Generated content is expressly forbidden in this forum. And if your posts continue to appear as if they are – whether or not they are – you are likely to get banned. If you wish to avoid looking like (and getting banned as) AI, then I suggest you tailor your replies to the individual posts, rather than providing overly-generic responses that don’t take into account the context of the entire conversation.