Community
    • Login

    How to copy or extract particular string value from each line ?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    11 Posts 5 Posters 5.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Sagar KurapatiS
      Sagar Kurapati
      last edited by

      Hello Everyone,

      I have text file with multiple lines.
      I would like to search id-numbers from each line and extract those id-numbers if matching string present in that line.

      Sample:

      1. 2020-06-12 00:00:01,971 INFO [com.bah.tesseract.aop.ValidationAop] POST/item_sub_type/get-sub-type {“id”:“92361e803a910f4fe0142f4398ff9cf5”}
      2. 2020-06-12 00:00:01,979 ERROR [com.bah.tesseract.util.TokenUtils] org.json.JSONException: JSONObject[“givenname”] not found.
      3. 2020-06-12 00:00:01,980 ERROR [com.bah.tesseract.util.TokenUtils] org.json.JSONException: JSONObject[“sn”] not found.
      4. 2020-06-12 00:00:02,161 INFO [com.bah.tesseract.aop.ValidationAop] POST/item_status/get-status {“id”:“a5c0dfd6f15c3c7dbeffb31f78748da0”}
      5. 2020-06-12 00:00:02,166 ERROR [com.bah.tesseract.util.TokenUtils] org.json.JSONException: JSONObject[“givenname”] not found.
      6. 2020-06-12 00:00:02,167 ERROR [com.bah.tesseract.util.TokenUtils] org.json.JSONException: JSONObject[“sn”] not found.
      7. 2020-06-12 00:00:02,330 INFO [com.bah.tesseract.aop.ValidationAop] POST/items/update-batch [{“lastAssignedTo”:“Jaspreet Kahlon (kah134)”,“relTo”:null,“assignedAnalystID”:“”,“secInfoDes2”:null,“secInfoDes1”:null,“resTyp”:“Letter”,“statusChangedDate”:“2020-06-12T00:00:02-0500”,“dueDate”:“06/17/2019”,“priBusUnt”:“Brokerage Operations”,“senInv”:null,“finalClosure”:true,“sclMedHndl”:null,“aprvdBy”:“Chris Sundquist”,“firm”:“7870”,“password”:null,“reqCFRef”:“P-2146”,“srcDtls”:“Enforcement”,“execNm”:null,“recomms”:null,“othPrtyNm”:null,“setDt”:null,“adjAmt”:null,“FnEx”:null,“id”:“4311bbbeca82649427b192c7b868133c”,“riaNm”:null,“additionalNames”:“Colin R Ward - Emily M Ward - Nolan C Ward - Elizabeth B Ward - Dennis S Miura - Elizabeth D Miura - Michael S Miura - Michelle A Miura - Mark Bostel - Susanne K Smith - Susanne Kay Holly Fam TST UA 11 15 2013”,“req”:“Polly Hayes”,“aOr”:null,“resDt”:“06/12/2019”,“accountType”:null,“resoDep”:null,“itemClosedDisposition”:null,“secInfoSym2”:null,“secInfoSym3”:null,“reportable”:null,“secInfoSym1”:null,“primaryAnalyst”:“kah134”,“depAsgnd”:null,“fnAmt”:null,“copiedFromItem”:null,“secInfoDes3”:null,“orgn”:“Email”,“closedBy”:“Jaspreet Kahlon (kah134)”,“complexity”:“Medium”,“srAnlyst”:“sun677”,“orgDt”:“06/17/2019”,“accountTitle”:null,“numOfTrds”:null,“itemSubType”:“SEC”,“assignedTo”:“”,“resoTyp”:null,“allgEndDt”:null,“extDt”:null,“ackSntDt”:null,“itemStatus”:“Closed”,“entityLastName”:“Liu”,“formName”:“document request”,“id”:“987654beca82649427b192c12348133c”,“secPrbCd”:null,“curDt”:“06/17/2019”,“secPrdCd”:null,“bdGenErr”:null,“summary”:null,“assignedToUserId”:“”,“assignedBy”:“”,“comments”:null,“genCat”:null,“recvdDt”:“06/03/2019”,“tpc”:“Client Documentation”,“lastModifiedBy”:“riskCanvas System(rcSystem)”,“priPrbCd”:null,“priPrdCd”:null,“secBusUnt”:null,“asso1”:null,“allgStartDt”:null,“assignedByUserId”:“”,“regTpc”:null,“entityFirstName”:"Chui Yu

      In the sample, I have four id-numbers.
      “id”:“92361e803a910f4fe0142f4398ff9cf5”
      “id”:“a5c0dfd6f15c3c7dbeffb31f78748da0”
      “id”:“4311bbbeca82649427b192c7b868133c”
      “id”:“987654beca82649427b192c12348133c”

      But i need only id-numbers which are present in the matching string POST/items/update-batch line.

      Sample Output:
      “id”:“4311bbbeca82649427b192c7b868133c”
      “id”:“987654beca82649427b192c12348133c”

      How can i achieve this ?

      Thank you.

      Alan KilbornA 1 Reply Last reply Reply Quote 0
      • Alan KilbornA
        Alan Kilborn @Sagar Kurapati
        last edited by

        @Sagar-Kurapati

        There’s a discussion toward the bottom of this thread which will help:
        https://community.notepad-plus-plus.org/topic/12710/marked-text-manipulation/

        But if that reading is a bit much for you, just try this:

        Open the Replace dialog by pressing Ctrl+h and then set up the following search parameters:

        Find what box: (?s).*?("id":".*?")|(?s).*\z
        Replace with box: ?1\1\r\n
        Search mode radiobutton: Regular expression
        Match case checkbox: ticked
        Wrap around checkbox: ticked
        In selection checkbox: unticked

        Then press the Replace All button.

        Note that I used simple double quotes in my solution; you may need to change those to “more complicated” double quotes, depending on your data.

        Alan KilbornA 1 Reply Last reply Reply Quote 1
        • Alan KilbornA
          Alan Kilborn @Alan Kilborn
          last edited by

          I suppose I should add that the above is a destructive operation, meaning your original text will be gone after doing it.

          But never fear: Do a Ctrl+a followed by a Ctrl+c (to copy your results to the clipboard), then do a Ctrl+z (to undo the changes made) and you should have your original text back.

          1 Reply Last reply Reply Quote 1
          • Alan KilbornA
            Alan Kilborn
            last edited by

            Am I the only one this happens to?:

            I make a posting. I forget about it.
            I get upvote(s). I go refresh myself on what was upvoted.

            Only during the “refresh” phase does it hit me that I left something out, or made some slight error (that upvoter(s) didn’t catch) in the original. I get a chance to correct/augment.

            PeterJonesP 1 Reply Last reply Reply Quote 1
            • PeterJonesP
              PeterJones @Alan Kilborn
              last edited by

              @Alan-Kilborn said in How to copy or extract particular string value from each line ?:

              that upvoter(s) didn’t catch

              I have a different philosophy on upvotes than StackExchange: I am not voting on the One Right Answer™®. I upvote posts that I think are helpful, useful, or interesting: a well-asked question (on the part of the OP or a responder asking for clarification), or a reply that moves forward the understanding of the solution, or one that actually solves the problem. I will also upvote some participation in interesting tangents (in general, as long as they aren’t getting in the way of providing a solution to the original question… though I’m probably guilty of encouraging/upvoting and even participating in tangents that get in the way).

              Alan KilbornA 1 Reply Last reply Reply Quote 1
              • Alan KilbornA
                Alan Kilborn @PeterJones
                last edited by

                @PeterJones

                In general, I agree. (So I upvoted your last post – :-) )

                The only possible “danger point” is for future readers using “upvotes” to just “correctness” or “this is the true solution”.
                When, if fact, neither of these situations might be true. :-)

                Alan KilbornA 1 Reply Last reply Reply Quote 1
                • guy038G
                  guy038
                  last edited by guy038

                  Hello, @sagar-kurapati, @alan-kilborn and All,

                  In order to identify MD5 zones in your text, ONLY IF the POST/items/update-batch string exists near the beginning of current line, here are 3 solutions :

                  • Regex A (?-s)^.*POST/items/update-batch.+?[[:xdigit:]]{32}.+

                  • Regex B (?-s)^.*POST/items/update-batch.+?[[:xdigit:]]{32}.+\R?

                  • Regex C (?-s)POST/items/update-batch.+?\K[[:xdigit:]]{32}|\G.+?\K[[:xdigit:]]{32}

                  Notes :

                  • The regex A find all lines containing the string POST/items/update-batch and, at least, one MD5 signature

                  • The regex B find all lines containing the string POST/items/update-batch and, at least, one MD5 signature + its line-break

                  • The regex C find any MD5 signature ( 32 hexadecimal digits ), ONLY in lines containing the POST/items/update-batch string

                  Remark : Regarding regex C, you must move the caret to a line not containing an MD5 signature, first !

                  Best Regards,

                  guy038

                  Alan KilbornA EkopalypseE 2 Replies Last reply Reply Quote 3
                  • Alan KilbornA
                    Alan Kilborn @guy038
                    last edited by Alan Kilborn

                    @PeterJones

                    And, see? I totally missed the OP’s desire for this:

                    But i need only id-numbers which are present in the matching string POST/items/update-batch line.

                    :-)

                    But luckily @guy038 picked me up, after I fell on my face.

                    I based my interpretation on what the OP bolded, not on what the textual explanation said.

                    But @guy038 didn’t do the text separation like I did, so OP will likely need to combine the two approaches.

                    1 Reply Last reply Reply Quote 1
                    • Alan KilbornA
                      Alan Kilborn @Alan Kilborn
                      last edited by

                      @Alan-Kilborn said in How to copy or extract particular string value from each line ?:

                      just “correctness”

                      I meant judge “correctness” … Ugh.

                      1 Reply Last reply Reply Quote 0
                      • EkopalypseE
                        Ekopalypse @guy038
                        last edited by

                        @guy038 said in How to copy or extract particular string value from each line ?:

                        The regex C find any MD5 signature ( 32 hexadecimal digits ), ONLY in lines containing the POST/items/update-batch string

                        Does not seem to be correct, because documentation states

                        The sequence \G matches only at the end of the last match found, or at the start of the text being matched if no previous match was found.

                        and if I do a test with the sample data then I get 92361e803a910f4fe0142f4398ff9cf5 matched as well.

                        1 Reply Last reply Reply Quote 1
                        • guy038G
                          guy038
                          last edited by guy038

                          Hi, @ekopalypse, @sagar-kurapati, @alan-kilborn and All,

                          @ekopalypse, you’re perfectly right about it ! I’m probably going to add a remark in my previous post !

                          Indeed, if I move the caret to a line containing a MD5 signature, without the restrictive condition that it must also contain the string POST/items/update-batch, my regex wrongly matches it :-((

                          The problem is that, after moving caret to any position, the regex engine thinks that this location is the very start of the text. It’s a general drawback of the powerful \G behavior !


                          Normally, the regex engine always tries, first, the first alternative of regex C, because of the (?-s) modifier and because of line-breaks in text. Indeed, the second alternative is never tried, first, as possible matches cannot be contiguous

                          • So the 1st part matches, for instance, the string OST/items/update-batch4311bbbeca82649427b192c7b868133c. Then the 2nd alternative \G.+?\K[[:xdigit:]]{32} matches the contiguous area, made of the smallest range of any standard char till an other MD5 signature and so on … till the end of current scanned line

                          • Then, because of the line-break chars, the next match cannot be contiguous, implying, necessarily, that the next match will be satisfied by the first alternative, only !


                          A solution would be to :

                          • Firstly mark all the lines containing the string OST/items/update-batch with a special symbol, ending each line

                          • Secondly, search for any MD5 signature, ONLY IF current line ends with that special symbol

                          • Thirdly delete this special symbol, as well

                          But, I haven’t found out a fair regex to suppress this drawback, yet !

                          To conclude, I think that the only sensible solution is to move the caret to the very beginning of file, which does not match, most of the time, the regex pattern located after the \G syntax ;-))

                          Best Regards,

                          guy038

                          1 Reply Last reply Reply Quote 0
                          • First post
                            Last post
                          The Community of users of the Notepad++ text editor.
                          Powered by NodeBB | Contributors