a newbie question about search
-
I never used this tool
I have huge js file which contain all my tweets from Twitter
Notepad++ opened it easily as text file
I need to extract corresponding two lines with specific strings.
I can search each one, but I need it as corresponding
Is it possible? -
@Roni-Segoly
You say you have a “hugejs
file”. If byjs
you mean JSON, the JsonTools plugin can help. If byjs
you mean JavaScript, I may not be able to help; JsonTools can handle some JavaScript objects even if they don’t comply with the original JSON specification, but it can’t handle all the complexities of JavaScript syntax.If you provide a small example of what you are trying to do, I may be able to suggest how to solve your problem with JsonTools or some other plugin like PythonScript.
-
@Mark-Olson The fact it’s JS is less relevant, I can save and treat as txt
See example below
I need the lines starting with “created_at” and “full_text” one after the others, or maybe with line number and then I can sort by line numbersI can send the whole file if needed
},
“display_text_range” : [
“0”,
“24”
],
“favorite_count” : “1”,
“in_reply_to_status_id_str” : “1846199292232339627”,
“id_str” : “1846219638109086002”,
“in_reply_to_user_id” : “4774540948”,
“truncated” : false,
“retweet_count” : “0”,
“id” : “1846219638109086002”,
“in_reply_to_status_id” : “1846199292232339627”,
“created_at” : “Tue Oct 15 16:00:37 +0000 2024”,
“favorited” : false,
“full_text” : “@zamir_shatz יש גם רמב"פ”,
“lang” : “iw”,
“in_reply_to_screen_name” : “zamir_shatz”,
“in_reply_to_user_id_str” : “4774540948” -
Overall, your posting is vague. There’s a FAQ about properly posting such questions.
Likely your data is actually:
}, "display_text_range" : [ "0", "24" ], "favorite_count" : "1", "in_reply_to_status_id_str" : "1846199292232339627", "id_str" : "1846219638109086002", "in_reply_to_user_id" : "4774540948", "truncated" : false, "retweet_count" : "0", "id" : "1846219638109086002", "in_reply_to_status_id" : "1846199292232339627", "created_at" : "Tue Oct 15 16:00:37 +0000 2024", "favorited" : false, "full_text" : "@zamir_shatz יש גם רמב"פ", "lang" : "iw", "in_reply_to_screen_name" : "zamir_shatz", "in_reply_to_user_id_str" : "4774540948"
If I were doing your task, I might start this way:
- Invoke Mark with Ctrl+m
- In Find what put
"created_at"|"full_text"
- Checkmark: Bookmark line, Match case, Wrap around and Regular expression
- Press Mark all
- On the Search menu, choose Bookmark, then select Copy Bookmarked Lines
- Create a new document with File > New (or simply press Ctrl+n)
- Do Ctrl+v (paste)
See what that gets you for a start.
-
Hello, @roni-segoly, @mark-olson, @alan-kilborn and All,
@roni-segoly, you did not provide enough text to guess which should be the right way to help you !
Do you mean that, from this INPUT text :
}, "display_text_range" : [ "0", "24" ], "favorite_count" : "1", "in_reply_to_status_id_str" : "1846199292232339627", "id_str" : "1846219638109086002", "in_reply_to_user_id" : "4774540948", "truncated" : false, "retweet_count" : "0", "id" : "1846219638109086002", "in_reply_to_status_id" : "1846199292232339627", "created_at" : "Tue Oct 15 16:00:37 +0000 2024", "favorited" : false, "full_text" : "@zamir_shatz יש גם רמב"פ", "lang" : "iw", "in_reply_to_screen_name" : "zamir_shatz", "in_reply_to_user_id_str" : "4774540948"
You are expecting this OUTPUT text, with the two lines, beginning with
created_at
orfull_text
, moved after the others ones ?}, "display_text_range" : [ "0", "24" ], "favorite_count" : "1", "in_reply_to_status_id_str" : "1846199292232339627", "id_str" : "1846219638109086002", "in_reply_to_user_id" : "4774540948", "truncated" : false, "retweet_count" : "0", "id" : "1846219638109086002", "in_reply_to_status_id" : "1846199292232339627", "favorited" : false, "lang" : "iw", "in_reply_to_screen_name" : "zamir_shatz", "in_reply_to_user_id_str" : "4774540948" "created_at" : "Tue Oct 15 16:00:37 +0000 2024", "full_text" : "@zamir_shatz יש גם רמב"פ",
Best Regards,
guy038
-
@Alan-Kilborn Managed, cheers
-
@Alan-Kilborn I did print screen of section of the file as not everyone has Hebrew characters
I need if possible to be without the labels and date and text in one line, separated by comma
Like
“Tue Oct 15 16:00:37 +0000 2024”, “@zamir_shatz יש גם רמב"פ”
I cannot post the link to the file yet, need two reputations -
@Roni-Segoly
In the future, you should refer to JSON as JSON orjson
, notjs
. Calling itjs
is confusing to programmers like me, becausejs
is generally used to refer to JavaScript, not JSON.JsonTools makes it easy to extract a few fields (like
full_text
andcreated_at
) from each object in an array of objects, which is what your tweets appear to be.- Open the JsonTools tree view for your file.
- In the text box in the upper left-hand corner of the tree view, enter the query
@[:][created_at, full_text]
. This RemesPath query will iterate through the array of objects and extract thecreated_at
andfull_text
fields from each object. - Click the
Submit query
button. - You can now look at the tree view and notice that the tree displays only the
full_text
andcreated_at
field in each object. - Click the
Save query result
button. - The fields you wanted will now be in a new buffer, which you can save to a new file if desired.
JsonTools has a lot of other features, like a sort form that can sort JSON arrays in a variety of different ways. I recommend reading the documentation; I put a lot of work into making it readable and thorough.
EDIT: Don’t post a link to the file. If it’s really large, it will waste the resources of this forum. I know what tweet JSON looks like; I have a bunch of it on my own computer that I use as examples to test JsonTools.
EDIT2: If you don’t know what I mean by “array” and “object”, you should read this introduction to JSON. It is a bad idea to work with JSON without understanding it.
-
-
Hi, @roni-segoly, @mark-olson, @alan-kilborn and All,
Very easy with regexes !
So :
-
Move to your file tab, first
-
Open the Replace dialog (
Ctrl + H
) -
Untick all box options
-
SEARCH
(?-is)^"created_at"\x20:\x20|\R"full_text"\x20:(.+),$
-
REPLACE
?1\1
-
Check the
Wrap around
option -
Select the
Regular expression
search mode -
Click, once only, on the
Replace All
button
Voila !
BR
guy038
-
-
-
@Mark-Olson said in a newbie question about search:
Not in general.
Well, I guess it depends.
Totally generally, then I agree with you.If the data is simple and well formed, it’s doable.
That was the presumption I was proceeding upon with my first answer to OP.But, everyone encouraged OP to say more…to little avail.
So, if the data was NOT simple and well formed, OP likely did not get what he wanted, at least from my method.But OP seemed satisfied, so, let’s move on…
-
@Alan-Kilborn said in a newbie question about search:
If the data is simple and well formed, it’s doable.
That was the presumption I was proceeding upon with my first answer to OP.Tweet JSON is extremely complex and deeply nested, with some fields appearing at different nesting depths. The
created_at
field, for example, appears in the root object, theretweeted_status
child of the root object, and theuser
child of the root object. If the JSON file is printed out with no depth-based indentation, your regex has no way of differentiating between thesecreated_at
fields. Thefull_text
field could also appear at different nesting depths.To expand, tweet JSON can have a structure that looks a little bit like this (only much, much worse):
[ { "Root1": { "bar": false, "quz": 1 }, "rOOt2": { "quz": 2, "bar": false }, "ROOT3": [ { "id": 1, "id_str": 2 } ], "ROot4": [ "id": -37, "id_str": 75 ] "roOT5": "blah" } ]
If you write a regex that searches for the
bar
field, you most likely won’t be able to tell whether its parent isRoot1
orrOOt2
. A similar issue happens with theid_str
andid
keys. -
@Mark-Olson said:
Tweet JSON is extremely complex and deeply nested, with some fields appearing at different nesting depths. The created_at field, for example, appears in the root object, the retweeted_status child of the root object, and the user child of the root object. If the JSON file is printed out with no depth-based indentation, your regex has no way of differentiating between these created_at fields. The full_text field could also appear at different nesting depths.
and blah blah blah…
I hope that’s for the benefit of the OP and not me, because I don’t care an iota about Tweet JSON or WTF the data is. I sold my solution as simple-minded, it’s up to OP to decide if it works for him, or to keep pursuing some other solution.
Know thy data…understand how you’re manipulating it – this is OP’s responsibility. As is asking a full and complete question, with representative data fully shown.
-
@Roni-Segoly
Yes, it’s definitely possible to extract specific lines from your large JS file in Notepad++, and there are a couple of ways you can approach this:Using Regular Expressions (Regex): Notepad++ has a powerful “Find” feature that supports regular expressions, which can help you search for patterns in your file. If you know the structure of the lines containing the specific strings you want to extract, you can use a regex search to locate them together.
Here’s how to use Regex in Notepad++:
Press Ctrl + F to open the Find dialog.
Go to the Find tab and select Regular expression in the search mode.
Use a regex pattern to find the string you need along with its corresponding line. For example:
markdown
Copy code
(YourFirstString.*\n.*YourSecondString)
This will match the first line with YourFirstString and the line immediately after it with YourSecondString.Using a Script Plugin: If you’re dealing with more complex extractions or specific logic, you might want to install the PythonScript or NppExec plugin, which allows you to write and execute small scripts directly in Notepad++. You can write a script that reads the file line by line, checks for the matching strings, and extracts the corresponding lines as needed.
These methods should help you extract the corresponding lines from your large JS file. If you have more details on the structure, I could help refine the search process further!
-
Please note: this is the second time you’ve come to the forum and posted ultra-generic content that sounds vaguely on-topic: it is highly reminiscent of AI-generated phraseology.
Please understand that posting AI-Generated content is expressly forbidden in this forum. And if your posts continue to appear as if they are – whether or not they are – you are likely to get banned. If you wish to avoid looking like (and getting banned as) AI, then I suggest you tailor your replies to the individual posts, rather than providing overly-generic responses that don’t take into account the context of the entire conversation.