Community
    • Login

    a newbie question about search

    Scheduled Pinned Locked Moved General Discussion
    16 Posts 6 Posters 556 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Roni SegolyR
      Roni Segoly @Mark Olson
      last edited by

      @Mark-Olson The fact it’s JS is less relevant, I can save and treat as txt
      See example below
      I need the lines starting with “created_at” and “full_text” one after the others, or maybe with line number and then I can sort by line numbers

      I can send the whole file if needed

      },
      “display_text_range” : [
      “0”,
      “24”
      ],
      “favorite_count” : “1”,
      “in_reply_to_status_id_str” : “1846199292232339627”,
      “id_str” : “1846219638109086002”,
      “in_reply_to_user_id” : “4774540948”,
      “truncated” : false,
      “retweet_count” : “0”,
      “id” : “1846219638109086002”,
      “in_reply_to_status_id” : “1846199292232339627”,
      “created_at” : “Tue Oct 15 16:00:37 +0000 2024”,
      “favorited” : false,
      “full_text” : “@zamir_shatz יש גם רמב"פ”,
      “lang” : “iw”,
      “in_reply_to_screen_name” : “zamir_shatz”,
      “in_reply_to_user_id_str” : “4774540948”

      Alan KilbornA Mark OlsonM 2 Replies Last reply Reply Quote 0
      • Alan KilbornA
        Alan Kilborn @Roni Segoly
        last edited by Alan Kilborn

        @Roni-Segoly

        Overall, your posting is vague. There’s a FAQ about properly posting such questions.

        Likely your data is actually:

        },
        "display_text_range" : [
        "0",
        "24"
        ],
        "favorite_count" : "1",
        "in_reply_to_status_id_str" : "1846199292232339627",
        "id_str" : "1846219638109086002",
        "in_reply_to_user_id" : "4774540948",
        "truncated" : false,
        "retweet_count" : "0",
        "id" : "1846219638109086002",
        "in_reply_to_status_id" : "1846199292232339627",
        "created_at" : "Tue Oct 15 16:00:37 +0000 2024",
        "favorited" : false,
        "full_text" : "@zamir_shatz יש גם רמב"פ",
        "lang" : "iw",
        "in_reply_to_screen_name" : "zamir_shatz",
        "in_reply_to_user_id_str" : "4774540948"
        

        If I were doing your task, I might start this way:

        • Invoke Mark with Ctrl+m
        • In Find what put "created_at"|"full_text"
        • Checkmark: Bookmark line, Match case, Wrap around and Regular expression
        • Press Mark all
        • On the Search menu, choose Bookmark, then select Copy Bookmarked Lines
        • Create a new document with File > New (or simply press Ctrl+n)
        • Do Ctrl+v (paste)

        See what that gets you for a start.

        Roni SegolyR 2 Replies Last reply Reply Quote 2
        • guy038G
          guy038
          last edited by guy038

          Hello, @roni-segoly, @mark-olson, @alan-kilborn and All,

          @roni-segoly, you did not provide enough text to guess which should be the right way to help you !

          Do you mean that, from this INPUT text :

          },
          "display_text_range" : [
          "0",
          "24"
          ],
          "favorite_count" : "1",
          "in_reply_to_status_id_str" : "1846199292232339627",
          "id_str" : "1846219638109086002",
          "in_reply_to_user_id" : "4774540948",
          "truncated" : false,
          "retweet_count" : "0",
          "id" : "1846219638109086002",
          "in_reply_to_status_id" : "1846199292232339627",
          "created_at" : "Tue Oct 15 16:00:37 +0000 2024",
          "favorited" : false,
          "full_text" : "@zamir_shatz יש גם רמב"פ",
          "lang" : "iw",
          "in_reply_to_screen_name" : "zamir_shatz",
          "in_reply_to_user_id_str" : "4774540948"
          

          You are expecting this OUTPUT text, with the two lines, beginning with created_at or full_text, moved after the others ones ?

          },
          "display_text_range" : [
          "0",
          "24"
          ],
          "favorite_count" : "1",
          "in_reply_to_status_id_str" : "1846199292232339627",
          "id_str" : "1846219638109086002",
          "in_reply_to_user_id" : "4774540948",
          "truncated" : false,
          "retweet_count" : "0",
          "id" : "1846219638109086002",
          "in_reply_to_status_id" : "1846199292232339627",
          "favorited" : false,
          "lang" : "iw",
          "in_reply_to_screen_name" : "zamir_shatz",
          "in_reply_to_user_id_str" : "4774540948"
          "created_at" : "Tue Oct 15 16:00:37 +0000 2024",
          "full_text" : "@zamir_shatz יש גם רמב"פ",
          

          Best Regards,

          guy038

          1 Reply Last reply Reply Quote 1
          • Roni SegolyR
            Roni Segoly @Alan Kilborn
            last edited by

            @Alan-Kilborn Managed, cheers

            1 Reply Last reply Reply Quote 1
            • Roni SegolyR
              Roni Segoly @Alan Kilborn
              last edited by

              @Alan-Kilborn I did print screen of section of the file as not everyone has Hebrew characters
              I need if possible to be without the labels and date and text in one line, separated by comma
              Like
              “Tue Oct 15 16:00:37 +0000 2024”, “@zamir_shatz יש גם רמב"פ”

              740f7dd0-aa70-4065-ab7f-23e3db7786a2-image.png
              I cannot post the link to the file yet, need two reputations

              Alan KilbornA 1 Reply Last reply Reply Quote 1
              • Mark OlsonM
                Mark Olson @Roni Segoly
                last edited by Mark Olson

                @Roni-Segoly
                In the future, you should refer to JSON as JSON or json, not js. Calling it js is confusing to programmers like me, because js is generally used to refer to JavaScript, not JSON.

                JsonTools makes it easy to extract a few fields (like full_text and created_at) from each object in an array of objects, which is what your tweets appear to be.

                1. Open the JsonTools tree view for your file.
                2. In the text box in the upper left-hand corner of the tree view, enter the query @[:][created_at, full_text]. This RemesPath query will iterate through the array of objects and extract the created_at and full_text fields from each object.
                3. Click the Submit query button.
                4. You can now look at the tree view and notice that the tree displays only the full_text and created_at field in each object.
                5. Click the Save query result button.
                6. The fields you wanted will now be in a new buffer, which you can save to a new file if desired.

                JsonTools has a lot of other features, like a sort form that can sort JSON arrays in a variety of different ways. I recommend reading the documentation; I put a lot of work into making it readable and thorough.

                EDIT: Don’t post a link to the file. If it’s really large, it will waste the resources of this forum. I know what tweet JSON looks like; I have a bunch of it on my own computer that I use as examples to test JsonTools.

                EDIT2: If you don’t know what I mean by “array” and “object”, you should read this introduction to JSON. It is a bad idea to work with JSON without understanding it.

                1 Reply Last reply Reply Quote 2
                • Alan KilbornA
                  Alan Kilborn @Roni Segoly
                  last edited by

                  @Roni-Segoly:

                  not everyone has Hebrew characters

                  They don’t?

                  1 Reply Last reply Reply Quote 0
                  • guy038G
                    guy038
                    last edited by

                    Hi, @roni-segoly, @mark-olson, @alan-kilborn and All,

                    Very easy with regexes !

                    So :

                    • Move to your file tab, first

                    • Open the Replace dialog ( Ctrl + H )

                    • Untick all box options

                    • SEARCH (?-is)^"created_at"\x20:\x20|\R"full_text"\x20:(.+),$

                    • REPLACE ?1\1

                    • Check the Wrap around option

                    • Select the Regular expression search mode

                    • Click, once only, on the Replace All button

                    Voila !

                    BR

                    guy038

                    Mark OlsonM 1 Reply Last reply Reply Quote 0
                    • Mark OlsonM
                      Mark Olson @guy038
                      last edited by

                      @guy038 said in a newbie question about search:

                      Very easy with regexes !

                      Not in general.

                      Alan KilbornA 1 Reply Last reply Reply Quote 0
                      • Alan KilbornA
                        Alan Kilborn @Mark Olson
                        last edited by

                        @Mark-Olson said in a newbie question about search:

                        Not in general.

                        Well, I guess it depends.
                        Totally generally, then I agree with you.

                        If the data is simple and well formed, it’s doable.
                        That was the presumption I was proceeding upon with my first answer to OP.

                        But, everyone encouraged OP to say more…to little avail.
                        So, if the data was NOT simple and well formed, OP likely did not get what he wanted, at least from my method.

                        But OP seemed satisfied, so, let’s move on…

                        Mark OlsonM 1 Reply Last reply Reply Quote 2
                        • Mark OlsonM
                          Mark Olson @Alan Kilborn
                          last edited by

                          @Alan-Kilborn said in a newbie question about search:

                          If the data is simple and well formed, it’s doable.
                          That was the presumption I was proceeding upon with my first answer to OP.

                          Tweet JSON is extremely complex and deeply nested, with some fields appearing at different nesting depths. The created_at field, for example, appears in the root object, the retweeted_status child of the root object, and the user child of the root object. If the JSON file is printed out with no depth-based indentation, your regex has no way of differentiating between these created_at fields. The full_text field could also appear at different nesting depths.

                          To expand, tweet JSON can have a structure that looks a little bit like this (only much, much worse):

                          [
                            {
                              "Root1": {
                                "bar": false,
                                "quz": 1
                              },
                              "rOOt2": {
                                "quz": 2,
                                "bar": false
                              },
                              "ROOT3": [
                                {
                                  "id": 1,
                                  "id_str": 2
                                }
                              ],
                              "ROot4": [
                                  "id": -37,
                                  "id_str": 75
                              ]
                              "roOT5": "blah"
                            }
                          ]
                          

                          If you write a regex that searches for the bar field, you most likely won’t be able to tell whether its parent is Root1 or rOOt2. A similar issue happens with the id_str and id keys.

                          Alan KilbornA 1 Reply Last reply Reply Quote 1
                          • Alan KilbornA
                            Alan Kilborn @Mark Olson
                            last edited by

                            @Mark-Olson said:

                            Tweet JSON is extremely complex and deeply nested, with some fields appearing at different nesting depths. The created_at field, for example, appears in the root object, the retweeted_status child of the root object, and the user child of the root object. If the JSON file is printed out with no depth-based indentation, your regex has no way of differentiating between these created_at fields. The full_text field could also appear at different nesting depths.

                            and blah blah blah…

                            I hope that’s for the benefit of the OP and not me, because I don’t care an iota about Tweet JSON or WTF the data is. I sold my solution as simple-minded, it’s up to OP to decide if it works for him, or to keep pursuing some other solution.

                            Know thy data…understand how you’re manipulating it – this is OP’s responsibility. As is asking a full and complete question, with representative data fully shown.

                            1 Reply Last reply Reply Quote 2
                            • Alen MarkA
                              Alen Mark @Roni Segoly
                              last edited by

                              @Roni-Segoly
                              Yes, it’s definitely possible to extract specific lines from your large JS file in Notepad++, and there are a couple of ways you can approach this:

                              Using Regular Expressions (Regex): Notepad++ has a powerful “Find” feature that supports regular expressions, which can help you search for patterns in your file. If you know the structure of the lines containing the specific strings you want to extract, you can use a regex search to locate them together.

                              Here’s how to use Regex in Notepad++:

                              Press Ctrl + F to open the Find dialog.
                              Go to the Find tab and select Regular expression in the search mode.
                              Use a regex pattern to find the string you need along with its corresponding line. For example:
                              markdown
                              Copy code
                              (YourFirstString.*\n.*YourSecondString)
                              This will match the first line with YourFirstString and the line immediately after it with YourSecondString.

                              Using a Script Plugin: If you’re dealing with more complex extractions or specific logic, you might want to install the PythonScript or NppExec plugin, which allows you to write and execute small scripts directly in Notepad++. You can write a script that reads the file line by line, checks for the matching strings, and extracts the corresponding lines as needed.

                              These methods should help you extract the corresponding lines from your large JS file. If you have more details on the structure, I could help refine the search process further!

                              PeterJonesP 1 Reply Last reply Reply Quote -3
                              • PeterJonesP
                                PeterJones @Alen Mark
                                last edited by

                                @Alen-Mark ,

                                Please note: this is the second time you’ve come to the forum and posted ultra-generic content that sounds vaguely on-topic: it is highly reminiscent of AI-generated phraseology.

                                Please understand that posting AI-Generated content is expressly forbidden in this forum. And if your posts continue to appear as if they are – whether or not they are – you are likely to get banned. If you wish to avoid looking like (and getting banned as) AI, then I suggest you tailor your replies to the individual posts, rather than providing overly-generic responses that don’t take into account the context of the entire conversation.

                                1 Reply Last reply Reply Quote 2
                                • First post
                                  Last post
                                The Community of users of the Notepad++ text editor.
                                Powered by NodeBB | Contributors