how to extract number between the tags then adding some number?
-
I am trying to extract number between the two tags. Here its ```
<startSample>somenumber</startSample> and <pitch>number with dot</pitch>**sample text :**
<audGrainData>
<startSample>253</startSample>
<pitch>14.1668348</pitch>
</audGrainData>
<audGrainData>
<startSample>2390</startSample>
<pitch>14.1668348</pitch>
</audGrainData>
<audGrainData>
<startSample>4521</startSample>
<pitch>14.1668348</pitch>
</audGrainData>
<audGrainData>
<startSample>6475</startSample>
<pitch>14.1668348</pitch>
</audGrainData>
<audGrainData>
<startSample>8547</startSample>
<pitch>14.2084284</pitch>
</audGrainData>**and I want it to be like this:** 253 14.1668348 10000 10200 2390 14.1668348 10000 10200 4521 14.1668348 10000 10200 6475 14.1668348 10000 10200 8547 14.2084284 10000 10200
-
Hello, @fajar-zulianto and All,
Very easy with regexes !
So, starting with your INPUT text, below, that I slightly changed in order to introduce possible leading indentation, on any line :
<audGrainData> <startSample>253</startSample> <pitch>14.1668348</pitch> </audGrainData> <audGrainData> <startSample>2390</startSample> <pitch>14.1668348</pitch> </audGrainData> <audGrainData> <startSample>4521</startSample> <pitch>14.1668348</pitch> </audGrainData> <audGrainData> <startSample>6475</startSample> <pitch>14.1668348</pitch> </audGrainData> <audGrainData> <startSample>8547</startSample> <pitch>14.2084284</pitch> </audGrainData>
-
Open the Replace dialog (
Ctrl + H
) -
Uncheck all box options
-
SEARCH
(?-is)^\h*<(audGrainData)>\R\h*<(startSample)>(.+)</\2>\R\h*<(pitch)>(.+)</\4>\R\h*</\1>
-
REPLACE
\3 \5 10000 10200
-
Check the
Wrap around
option -
Select the
Regular expression
search mode -
Click once on the
Repalce All
button
=> You should get the expected OUTPUT text :
253 14.1668348 10000 10200 2390 14.1668348 10000 10200 4521 14.1668348 10000 10200 6475 14.1668348 10000 10200 8547 14.2084284 10000 10200
Voila !
NOTES :
-
The
(?-is)
syntax are in-line modifiers which ensure that the search is sensible to case and that the.
represents one standard character only -
There are five groups, located between parentheses, whose three of them store the name of the tags (
audGrainData
,startSample
andpitch
) and two of them store the numbers(.+)
to re-use them in the replacement, with the\3
and\5
syntax -
The
\h*
syntax represents possible leadingspace
ortabulation
characters, on any line -
The
\R
syntax stands for any kind of line-break (\r\n
,\n
or\r
)
Best Regards,
guy038
-