How to copy or extract particular string value from each line ?

Sagar Kurapati

Hello Everyone,

I have text file with multiple lines.
I would like to search id-numbers from each line and extract those id-numbers if matching string present in that line.

Sample:

2020-06-12 00:00:01,971 INFO [com.bah.tesseract.aop.ValidationAop] POST/item_sub_type/get-sub-type {“id”:“92361e803a910f4fe0142f4398ff9cf5”}
2020-06-12 00:00:01,979 ERROR [com.bah.tesseract.util.TokenUtils] org.json.JSONException: JSONObject[“givenname”] not found.
2020-06-12 00:00:01,980 ERROR [com.bah.tesseract.util.TokenUtils] org.json.JSONException: JSONObject[“sn”] not found.
2020-06-12 00:00:02,161 INFO [com.bah.tesseract.aop.ValidationAop] POST/item_status/get-status {“id”:“a5c0dfd6f15c3c7dbeffb31f78748da0”}
2020-06-12 00:00:02,166 ERROR [com.bah.tesseract.util.TokenUtils] org.json.JSONException: JSONObject[“givenname”] not found.
2020-06-12 00:00:02,167 ERROR [com.bah.tesseract.util.TokenUtils] org.json.JSONException: JSONObject[“sn”] not found.
2020-06-12 00:00:02,330 INFO [com.bah.tesseract.aop.ValidationAop] POST/items/update-batch [{“lastAssignedTo”:“Jaspreet Kahlon (kah134)”,“relTo”:null,“assignedAnalystID”:“”,“secInfoDes2”:null,“secInfoDes1”:null,“resTyp”:“Letter”,“statusChangedDate”:“2020-06-12T00:00:02-0500”,“dueDate”:“06/17/2019”,“priBusUnt”:“Brokerage Operations”,“senInv”:null,“finalClosure”:true,“sclMedHndl”:null,“aprvdBy”:“Chris Sundquist”,“firm”:“7870”,“password”:null,“reqCFRef”:“P-2146”,“srcDtls”:“Enforcement”,“execNm”:null,“recomms”:null,“othPrtyNm”:null,“setDt”:null,“adjAmt”:null,“FnEx”:null,“id”:“4311bbbeca82649427b192c7b868133c”,“riaNm”:null,“additionalNames”:“Colin R Ward - Emily M Ward - Nolan C Ward - Elizabeth B Ward - Dennis S Miura - Elizabeth D Miura - Michael S Miura - Michelle A Miura - Mark Bostel - Susanne K Smith - Susanne Kay Holly Fam TST UA 11 15 2013”,“req”:“Polly Hayes”,“aOr”:null,“resDt”:“06/12/2019”,“accountType”:null,“resoDep”:null,“itemClosedDisposition”:null,“secInfoSym2”:null,“secInfoSym3”:null,“reportable”:null,“secInfoSym1”:null,“primaryAnalyst”:“kah134”,“depAsgnd”:null,“fnAmt”:null,“copiedFromItem”:null,“secInfoDes3”:null,“orgn”:“Email”,“closedBy”:“Jaspreet Kahlon (kah134)”,“complexity”:“Medium”,“srAnlyst”:“sun677”,“orgDt”:“06/17/2019”,“accountTitle”:null,“numOfTrds”:null,“itemSubType”:“SEC”,“assignedTo”:“”,“resoTyp”:null,“allgEndDt”:null,“extDt”:null,“ackSntDt”:null,“itemStatus”:“Closed”,“entityLastName”:“Liu”,“formName”:“document request”,“id”:“987654beca82649427b192c12348133c”,“secPrbCd”:null,“curDt”:“06/17/2019”,“secPrdCd”:null,“bdGenErr”:null,“summary”:null,“assignedToUserId”:“”,“assignedBy”:“”,“comments”:null,“genCat”:null,“recvdDt”:“06/03/2019”,“tpc”:“Client Documentation”,“lastModifiedBy”:“riskCanvas System(rcSystem)”,“priPrbCd”:null,“priPrdCd”:null,“secBusUnt”:null,“asso1”:null,“allgStartDt”:null,“assignedByUserId”:“”,“regTpc”:null,“entityFirstName”:"Chui Yu

In the sample, I have four id-numbers.
“id”:“92361e803a910f4fe0142f4398ff9cf5”
“id”:“a5c0dfd6f15c3c7dbeffb31f78748da0”
“id”:“4311bbbeca82649427b192c7b868133c”
“id”:“987654beca82649427b192c12348133c”

But i need only id-numbers which are present in the matching string POST/items/update-batch line.

Sample Output:
“id”:“4311bbbeca82649427b192c7b868133c”
“id”:“987654beca82649427b192c12348133c”

How can i achieve this ?

Thank you.

Alan Kilborn

@Sagar-Kurapati

There’s a discussion toward the bottom of this thread which will help:
https://community.notepad-plus-plus.org/topic/12710/marked-text-manipulation/

But if that reading is a bit much for you, just try this:

Open the Replace dialog by pressing Ctrl+h and then set up the following search parameters:

Find what box: (?s).*?("id":".*?")|(?s).*\z
Replace with box: ?1\1\r\n
Search mode radiobutton: Regular expression
Match case checkbox: ticked
Wrap around checkbox: ticked
In selection checkbox: unticked

Then press the Replace All button.

Note that I used simple double quotes in my solution; you may need to change those to “more complicated” double quotes, depending on your data.

Alan Kilborn

I suppose I should add that the above is a destructive operation, meaning your original text will be gone after doing it.

But never fear: Do a Ctrl+a followed by a Ctrl+c (to copy your results to the clipboard), then do a Ctrl+z (to undo the changes made) and you should have your original text back.

Alan Kilborn

Am I the only one this happens to?:

I make a posting. I forget about it.
I get upvote(s). I go refresh myself on what was upvoted.

Only during the “refresh” phase does it hit me that I left something out, or made some slight error (that upvoter(s) didn’t catch) in the original. I get a chance to correct/augment.

PeterJones

@Alan-Kilborn said in How to copy or extract particular string value from each line ?:

that upvoter(s) didn’t catch

I have a different philosophy on upvotes than StackExchange: I am not voting on the One Right Answer™®. I upvote posts that I think are helpful, useful, or interesting: a well-asked question (on the part of the OP or a responder asking for clarification), or a reply that moves forward the understanding of the solution, or one that actually solves the problem. I will also upvote some participation in interesting tangents (in general, as long as they aren’t getting in the way of providing a solution to the original question… though I’m probably guilty of encouraging/upvoting and even participating in tangents that get in the way).

Alan Kilborn

@PeterJones

In general, I agree. (So I upvoted your last post – :-) )

The only possible “danger point” is for future readers using “upvotes” to just “correctness” or “this is the true solution”.
When, if fact, neither of these situations might be true. :-)

guy038

Hello, @sagar-kurapati, @alan-kilborn and All,

In order to identify MD5 zones in your text, ONLY IF the POST/items/update-batch string exists near the beginning of current line, here are 3 solutions :

Regex A (?-s)^.*POST/items/update-batch.+?[[:xdigit:]]{32}.+
Regex B (?-s)^.*POST/items/update-batch.+?[[:xdigit:]]{32}.+\R?
Regex C (?-s)POST/items/update-batch.+?\K[[:xdigit:]]{32}|\G.+?\K[[:xdigit:]]{32}

Notes :

The regex A find all lines containing the string POST/items/update-batch and, at least, one MD5 signature
The regex B find all lines containing the string POST/items/update-batch and, at least, one MD5 signature + its line-break
The regex C find any MD5 signature ( 32 hexadecimal digits ), ONLY in lines containing the POST/items/update-batch string

Remark : Regarding regex C, you must move the caret to a line not containing an MD5 signature, first !

Best Regards,

guy038

Alan Kilborn

@PeterJones

And, see? I totally missed the OP’s desire for this:

But i need only id-numbers which are present in the matching string POST/items/update-batch line.

:-)

But luckily @guy038 picked me up, after I fell on my face.

I based my interpretation on what the OP bolded, not on what the textual explanation said.

But @guy038 didn’t do the text separation like I did, so OP will likely need to combine the two approaches.

Alan Kilborn

@Alan-Kilborn said in How to copy or extract particular string value from each line ?:

just “correctness”

I meant judge “correctness” … Ugh.

Ekopalypse

@guy038 said in How to copy or extract particular string value from each line ?:

The regex C find any MD5 signature ( 32 hexadecimal digits ), ONLY in lines containing the POST/items/update-batch string

Does not seem to be correct, because documentation states

The sequence \G matches only at the end of the last match found, or at the start of the text being matched if no previous match was found.

and if I do a test with the sample data then I get 92361e803a910f4fe0142f4398ff9cf5 matched as well.

guy038

Hi, @ekopalypse, @sagar-kurapati, @alan-kilborn and All,

@ekopalypse, you’re perfectly right about it ! I’m probably going to add a remark in my previous post !

Indeed, if I move the caret to a line containing a MD5 signature, without the restrictive condition that it must also contain the string POST/items/update-batch, my regex wrongly matches it :-((

The problem is that, after moving caret to any position, the regex engine thinks that this location is the very start of the text. It’s a general drawback of the powerful \G behavior !

Normally, the regex engine always tries, first, the first alternative of regex C, because of the (?-s) modifier and because of line-breaks in text. Indeed, the second alternative is never tried, first, as possible matches cannot be contiguous

So the 1st part matches, for instance, the string OST/items/update-batch4311bbbeca82649427b192c7b868133c. Then the 2nd alternative \G.+?\K[[:xdigit:]]{32} matches the contiguous area, made of the smallest range of any standard char till an other MD5 signature and so on … till the end of current scanned line
Then, because of the line-break chars, the next match cannot be contiguous, implying, necessarily, that the next match will be satisfied by the first alternative, only !

A solution would be to :

Firstly mark all the lines containing the string OST/items/update-batch with a special symbol, ending each line
Secondly, search for any MD5 signature, ONLY IF current line ends with that special symbol
Thirdly delete this special symbol, as well

But, I haven’t found out a fair regex to suppress this drawback, yet !

To conclude, I think that the only sensible solution is to move the caret to the very beginning of file, which does not match, most of the time, the regex pattern located after the \G syntax ;-))

Best Regards,

guy038