Using RegEx to format text for import
-
I have this text
With lines like these
820 Review Conputer Process Annually No 2016-06-02 2019-05-31
1064 Review customer documentation Annually No 2016-07-25 2019-07-25
1282 Monitor and Clean up the Mailbox Weekly No 2015-09-21 2020-05-12
1644 Generic paragraph for monthly report Monthly Yes 2015-10-19 2017-04-13I want to keep the text how ever I need to set a delimiter between each field.
I created a regex to Identify the fields. However I cannot seem to figure out how to use it to “replace” the correct spaces to ; as field seperator.Sorry for the portato code…
(\d{3,4}).(?<=)(.) (?=(Weekly|Monthly|Semi-annually|Quarterly|Annually|Bi-monthly))(.)(No|Yes).(\d{4}-\d{2}-\d{2}).(\d{4}-\d{2}-\d{2})I need to get the above text to look like this
820;Review Conputer Process;Annually;No;2016-06-02;2019-05-31
1064;Review customer documentation;Annually;No;2016-07-25;2019-07-25
1282;Monitor and Clean up the Mailbox;Weekly;No;2015-09-21;2020-05-12
1644;Generic paragraph for monthly report;Monthly;Yes;2015-10-19;2017-04-13the first segment can be either 3 og 4 digits. There is no lenght on the second, the third can be Weekly,Monthly,Semi-annually,Quarterly,Annually or Bi-monthly and the forth one can be either Yes or No and the lat two is just date.
I have to do it one segment of a time, that I ok. Doesn’t need to be fancy.
Hope you can help.
-
from the given data I would think this should do it
find what:^(\d{3,4})\h(.+?)\h(Weekly|Monthly|Semi-annually|Quarterly|Annually|Bi-monthly)\h(No|Yes)\h(\d{4}-\d{2}-\d{2})\h(\d{4}-\d{2}-\d{2})
repalce with:$1;$2;$3;$4;$5;$6
-
Hello, @dennis-andersen, and All,
Ah ! @ekopalypse beat me to it ;-))
This other regex S/R should also work :
SEARCH
(?-si)^\d{3,4}\K\h+|\h+(?!.+(Weekly|Monthly|Semi-annually|Quarterly|Annually|Bi-monthly))
REPLACE
;
Notes :
-
As usual, the in-line modifiers
(?-si)
ensure that :-
Any dot regex symbol
.
represents a single standard character ( NOT like-breaks ) -
The search is processed in a non-insensitive way
-
-
Then the regex contains two alternatives, separated by the
|
alternation symbol. So the regex engine searches, from current location, for, either :-
Some horizontal blank characters, after a
3
or4-digits
number, at beginning of lines and, due to the\K
syntax, only matches these blank char(s)^\d{3,4}\K\h+
-
Some horizontal blank characters, ONLY IF NOT followed by a non-null range of standard chars before the periodicity field
\h+(?!.+(Weekly|Monthly|Semi-annually|Quarterly|Annually|Bi-monthly))
-
-
Due the
\K
syntax, you cannot use the step by step replacement with theReplace
button. You must use theReplace All
button, exclusively
Best Regards,
guy038
BTW, @dennis-andersen, you very well described your goal and data which greatly helped us to figure out regex solutions ;-))
-
-
You guys are the best!!!
One day I hope to be half as smart as you!!
I got help and I learned something. Best day of the year so far…
Thanks and have a great day.
Dennis -
Hello
-
List itemIdentify the common text string you wish to remove, in this example, “Accepts Google Pay”
-
List itemClick the Column Options dropdown arrow and select Set regular expression. The Set regular expression dialog box appears.
-
List itemIn the Match box, enter “Accepts Google Pay\s+(.+)” to identify the Accepts Google Pay string and then capture the remainder of the string in the first capture capture group.
-
List itemIn the Replace box, enter $1 to print out the first capture group. The prefix disappears from the column data, leaving the remainder of the string intact.
I hope this information will be usefull for you.
Thank you -