Replace tilde (~) with headword
-
Hello guy,
Thank you for the attempt. I appreciate it.
I’m sorry that I thought it was clear what I would wish to achieve. You know, in printed dictionaries some headwords are replaced with tilde expecialy in usage examples to save space. Suppose “once in a blue moon” is a headword in a dictionary, in a usage example like “I need regexes complicated this much once in a blue moon”, you would replace the “once in a blue moon” with a tilde, get “I need regexes complicated this much ~”, thus save space. You may be more familiar with the use of it in IT-field, use in file/folder directories comes to my mind: “\sys\var\bin~”.
Now that you have dealt with it and come so far with your regex you might want to give it final touches and finish it? Although I don’t need it for this dictionary file anymore, I would like to keep it, should the need for future use arise.
Thank you again.
-
Thank you so much, Terry. You were in the right direction.
Someone has provided me with Perl script for this job, ended up running it for me on the file since I have zero knowledge about and experience with Perl. After I got the finished file from him, I deleted the old file. But I have created a prototype of the dictionary with the above example cards, mixing and repeating them a couple of times. Your regex seems to work, at least well on the isolated cards/in the prototype. Actually that’s what I wanted to see in the first place. You may still consider yours rough, unfinished, but as long as it works, it is final regex for me. I don’t mind clicking “Replace all” 30 or 40 times for the whole file.
Just one point though: In the above examples, the entry of “Absicht” hast the most tildes: 10. So, I was expecting I would click on “Replace all” at most 10 ten times for the created prototype, but ended up with clicking 14 times. Is this okay or is my assumption/calculation wrong?
Thank you so much again!
glossar -
@glossar said in Replace tilde (~) with headword:
In the above examples, the entry of “Absicht” hast the most tildes: 10
I actually counted 13. I suspect you doing it 14 times would be replacing on the first 13, then on the 14th you would have had a “no 0 occurrences replaced” in the Replace window. Is that correct?
Terry
-
@Terry-R said in Replace tilde (~) with headword:
@glossar said in Replace tilde (~) with headword:
In the above examples, the entry of “Absicht” hast the most tildes: 10
I actually counted 13. I suspect you doing it 14 times would be replacing on the first 13, then on the 14th you would have had a “no 0 occurrences replaced” in the Replace window. Is that correct?
Terry
Heck, why did I myself with my eyes and manually count them while Notepad can do it? :) You are right, it contains 13, and on the 14th would I get the message “nothing replaced” or something like that.
Anyway, I conclude that yours works, so I’ll keep it for future use.
Thank you so much again.
Cheers,
glossar -
Hello, @glossar, @terry-r All,
Yes, I admit that I did not catch the problem in the right way ! So, this is my second version to solve your problem which, fortunately, does not require successive clicks on the
Replace
button. Of course, now that the goal is well identified, it’s must easier to build the suitable regexes !With this second solution, the idea is :
-
Firstly, to rewrite each current headword, in a new line, above the next headword entry or at the very end of file
-
Secondly, to search any
\~
string, considering, in a look-ahead regex structure, the temporary current headword, located at end of each card section
So, let’s take, again, this initial sample :
Absicht [m0][c darkred][b]Absicht[/b][/c] [/m] [m1][i][c darkslategray]f[/c] [/i] intention; [c teal]das war bestimmt nicht meine \~![/c] it was an accident!, I didn't mean to do it!; [c teal]es war schon immer seine \~, reich zu werden[/c] it has always been his goal to be rich; [c teal]das lag nicht in meiner \~[/c] that was definitely not what I intended; [c teal]mit den besten \~en[/c] with the best of intentions; [c teal]ernste \~en haben[/c] to have honourable \[[i]or[/i] [c sienna]AM[/c] -orable] intentions; [c teal]verborgene \~en[/c] hidden intentions; [c teal]die \~ haben, etw zu tun[/c] to have the intention of doing sth; [c teal]in selbstmörderischer \~[/c] with the intention of killing herself/himself; [c teal]\~ sein[/c] to be intentional; [c teal]in der \~, etw zu tun[/c] with a view to \[[i]or[/i] the intention of] doing sth; [c teal]er verfolgte sie in der \~, sie zu berauben[/c] he followed her with intent to rob her; [c teal]eine \~ verfolgen[/c] to pursue a goal; [c teal]mit/ohne \~[/c] intentionally/unintentionally[/m] [m1][c darkslategray]<-, -en>[/c] [/m] Kaminaufsatz [m0][c darkred][b]Kaminaufsatz[/b][/c] [/m] [m1][i][c darkslategray]m[/c] [/i] chimney pot[/m] Kaminbesteck [m0][c darkred][b]Kaminbesteck[/b][/c] [/m] [m1][i][c darkslategray]nt[/c] [/i] fireside companion set[/m] Kaminfeuer [m0][c darkred][b]Kaminfeuer[/b][/c] [/m] [m1][i][c darkslategray]nt[/c] [/i] open fire; [c teal]ein \~ machen[/c] to light the fireplace[/m]
Note that I, intentionally, swapped the lines
[m1][i][c darkslategray]f[/c] [/i]........
and[m1][c darkslategray]<-, -en>[/c] [/m]
, in order that a line, containing the\~
syntax, does not end a card section !Then, the following regex S/R :
SEARCH
(?s)^(\w+).+?\K(?=^\w+|(\Z))
REPLACE
(?2\r\n)\1\r\n
adds each headword at the end of each card section and gives :
Absicht [m0][c darkred][b]Absicht[/b][/c] [/m] [m1][i][c darkslategray]f[/c] [/i] intention; [c teal]das war bestimmt nicht meine \~![/c] it was an accident!, I didn't mean to do it!; [c teal]es war schon immer seine \~, reich zu werden[/c] it has always been his goal to be rich; [c teal]das lag nicht in meiner \~[/c] that was definitely not what I intended; [c teal]mit den besten \~en[/c] with the best of intentions; [c teal]ernste \~en haben[/c] to have honourable \[[i]or[/i] [c sienna]AM[/c] -orable] intentions; [c teal]verborgene \~en[/c] hidden intentions; [c teal]die \~ haben, etw zu tun[/c] to have the intention of doing sth; [c teal]in selbstmörderischer \~[/c] with the intention of killing herself/himself; [c teal]\~ sein[/c] to be intentional; [c teal]in der \~, etw zu tun[/c] with a view to \[[i]or[/i] the intention of] doing sth; [c teal]er verfolgte sie in der \~, sie zu berauben[/c] he followed her with intent to rob her; [c teal]eine \~ verfolgen[/c] to pursue a goal; [c teal]mit/ohne \~[/c] intentionally/unintentionally[/m] [m1][c darkslategray]<-, -en>[/c] [/m] Absicht Kaminaufsatz [m0][c darkred][b]Kaminaufsatz[/b][/c] [/m] [m1][i][c darkslategray]m[/c] [/i] chimney pot[/m] Kaminaufsatz Kaminbesteck [m0][c darkred][b]Kaminbesteck[/b][/c] [/m] [m1][i][c darkslategray]nt[/c] [/i] fireside companion set[/m] Kaminbesteck Kaminfeuer [m0][c darkred][b]Kaminfeuer[/b][/c] [/m] [m1][i][c darkslategray]nt[/c] [/i] open fire; [c teal]ein \~ machen[/c] to light the fireplace[/m] Kaminfeuer
Note that you must use the
Replace All
button, ONLY ( Do not use theReplace
button at all ! ), for this S/R
Now, this second regex S/R :
SEACRH
(?s)\x20\\~,?\x20?(?=.+?^(\w+))|^\w+\R(?=\w+|\Z)
REPLACE
?1\x20\\\1\x20
-
Replaces each
\~
string, possibly followed with comma and/or space characters, with a space char, the\
symbol, then the current headword and a final space character -
Deletes any headword, temporarily inserted at the end of each card section
And you get your expected output :
Absicht [m0][c darkred][b]Absicht[/b][/c] [/m] [m1][i][c darkslategray]f[/c] [/i] intention; [c teal]das war bestimmt nicht meine \Absicht ![/c] it was an accident!, I didn't mean to do it!; [c teal]es war schon immer seine \Absicht reich zu werden[/c] it has always been his goal to be rich; [c teal]das lag nicht in meiner \Absicht [/c] that was definitely not what I intended; [c teal]mit den besten \Absicht en[/c] with the best of intentions; [c teal]ernste \Absicht en haben[/c] to have honourable \[[i]or[/i] [c sienna]AM[/c] -orable] intentions; [c teal]verborgene \Absicht en[/c] hidden intentions; [c teal]die \Absicht haben, etw zu tun[/c] to have the intention of doing sth; [c teal]in selbstmörderischer \Absicht [/c] with the intention of killing herself/himself; [c teal]\~ sein[/c] to be intentional; [c teal]in der \Absicht etw zu tun[/c] with a view to \[[i]or[/i] the intention of] doing sth; [c teal]er verfolgte sie in der \Absicht sie zu berauben[/c] he followed her with intent to rob her; [c teal]eine \Absicht verfolgen[/c] to pursue a goal; [c teal]mit/ohne \Absicht [/c] intentionally/unintentionally[/m] [m1][c darkslategray]<-, -en>[/c] [/m] Kaminaufsatz [m0][c darkred][b]Kaminaufsatz[/b][/c] [/m] [m1][i][c darkslategray]m[/c] [/i] chimney pot[/m] Kaminbesteck [m0][c darkred][b]Kaminbesteck[/b][/c] [/m] [m1][i][c darkslategray]nt[/c] [/i] fireside companion set[/m] Kaminfeuer [m0][c darkred][b]Kaminfeuer[/b][/c] [/m] [m1][i][c darkslategray]nt[/c] [/i] open fire; [c teal]ein \Kaminfeuer machen[/c] to light the fireplace[/m]
Notes :
-
In search, the regex searches for a space character, followed with the
\~
string, in the four casesa
,b
,c
andd
, below -
In all cases, it is replaced with a space character, a backslash symbol, the current headword ( Group
1
=\1
) and a final space char -
When the second alternative, without any group,
^\w+\R(?=\w+|\Z)
is matched, the temporarily current headword is simply deleted
SEARCH => REPLACEMENT \x20 \\ ~ ,? \x20? => \x20 \\ \1 \x20 if Group 1 a space \ ~ comma space => space \ headword space if Group 1 b space \ ~ comma => space \ headword space if Group 1 c space \ ~ space => space \ headword space if Group 1 d space \ ~ => space \ headword space if Group 1 CURRENT headword + EOL => DELETED if NOT group 1 ^ \w+ \R => EMPTY if NOT group 1
Best Regards
guy038
-
-
Thank you so much, guy! I really appreciate your help!
I’ve quickly tested your regexes on another file, structurally very similar to the above-mentioned one. It seems you have done pretty decent job!
Two points though:
- With the second regex, am I supposed to click on “Replace all” once? Out of curiosity, I have clicked the second time, it made more replacements - at the third it stopped, I also confirmed searching for a (remained) tilde, found in fact none.
- Knowing what you can do with regex, I can’t believe you didn’t made the before- and after-tilde variations more flexible, limited them to only four cases, as you demonstrated above. On this another file, there are indeed many tildes which don’t follow your above patterns, namely no space before or after the tilde. After applied your regexes, I performed for the remained ones the following : ``
Find what: [backslash]~
Replace with: \s[backslash]~
Some are meant for example to be part of the headword (= endings, etc.):
After replacement, I got: Leistung en (Plural of the headword) (notice the space betwen the “g” and “en”.)
guy - Tilde can be part of nothing - neither of words, nor of tags, nor of codes, at least not on this file. So, you can’t possibly ruin anything with tilde-replacement. I mean, you could be more harsh on it. :)
If you could adjust the regexes accordingly, that would be perfect. Otherwise I’m already happy with the current ones, and once again, thank you so much for the effort.
Cheers,
glossar -
@glossar said in Replace tilde (~) with headword:
Knowing what you can do with regex, I can’t believe you didn’t made the before- and after-tilde variations more flexible, limited them to only four cases, as you demonstrated above. On this another file, there are indeed many tildes which don’t follow your above patterns, namely no space before or after the tilde.
So this isn’t a regex-writing service.
You are supposed to learn from the gracious help that has been thus far provided to you, and adapt the solutions provided to new and similar problems. -
Have I told or assumed that this is a regex-writing service? Let’s not speak English like that, shall we?
-
@glossar said in Replace tilde (~) with headword:
Have I told or assumed that this is a regex-writing service?
It sure seems like you’re doing this.
And your tone seems a bit, well, demanding. -
@glossar said in Replace tilde (~) with headword:
Have I told or assumed that this is a regex-writing service?
Your behavior, including the text that @Alan-Kilborn quoted, strongly suggested that you do effectively think of this as a regex writing service: otherwise, why ask for a new version of the regex when you had already admitted that the one already given to you worked?
Basically, the way that your final request came across was equivalent to “what you gave me worked… but, since you are writing regexes for me anyway, could you also change the regex to handle more edge cases for me or do something differently, without me having to try to modify the regex on my own.”.
If, instead, you had said something like: “Thanks for the regex, it works fine for the purposes described above. I was investigating whether I could change the regex to allow it to do the “tilde-replacement” I described earlier, and I tried modifying it to
....blah....
or...something...
, because I thought that thexxx
portion of the regex would allow for the “tilde-replacement”; however, instead of getting what I expected, I got _____” (with actual regexes and sample data). If you had put in the effort to try to change it yourself, explained why you thought your changes would work, and explained how their results didn’t match your expectations, then at least you would have shown that you were trying to learn and apply what we’d already taught you, rather than requesting yet again to “write the regex for me”.The best way to stay in the good-graces of the regulars here who are helping you with regexes is to show that you are trying to learn and apply the examples already given, and branching off on your own. The best way to frustrate the regulars here who are trying to help is to keep on changing the requirements and asking for changes to the free regexes you’ve already been given, without even trying to make the changes yourself.
-
Hello, glossar, @terry-r, @alan-kilborn and All,
With the first regex S/R provided in previous post, we’re able to insert the current headword, at the end of each card section, to refer to when further match of the
\~
string, located after a space char. OK !So, now, let’s start with the regex
\x20\\~
which matches a space character followed by the\
symbol and a~
character and which must be replaced, somehow, with the currentheadword
!Assuming this example, from the
Absicht
entry :\~, etw zu tun[/c] with a view to \[[i]or[/i] the intention of] doing sth; [c teal]er verfolgte sie in der
@glossar, Which kind of output do you expect to ?
-
(A)
\Absicht, etw zu tun[/c] with......
, where everything after~
, till next word char, is just rewritten, as it is -
(B)
\Absicht etw zu tun[/c] with......
, where everything after~
, till next word char, is replaced with a single space char -
(C)
\Absicht: etw zu tun[/c] with......
, where everything after~
, till next word char, is replaced with:
and a single space char -
(D)
\Absicht; etw zu tun[/c] with......
, where everything after~
, till next word char, is replaced with;
and a single space char -
(E)
\Absicht. etw zu tun[/c] with......
, where everything after~
, till next word char, is replaced with.
and a single space char -
(F)
\Absicht> etw zu tun[/c] with......
, where everything after~
, till next word char, is replaced with>
and a single space char -
(G)
\Absichtetw zu tun[/c] with......
, where everything after~
, till next word char, is simply deleted
Do you prefer to delete, in replacement, the
\
backslash, located before the~
, like below ?-
(H)
Absicht, etw zu tun[/c] with......
, -
(I)
Absicht etw zu tun[/c] with......
, -
(J)
Absicht: etw zu tun[/c] with......
, -
(K)
Absicht; etw zu tun[/c] with......
, -
(L)
Absicht. etw zu tun[/c] with......
, -
(M)
Absicht> etw zu tun[/c] with......
, -
(N)
Absichtetw zu tun[/c] with......
,
Do you need an extra space char, between the headword and the symbol, as below :
-
(O)
Absicht , etw zu tun[/c] with......
, -
(P)
Absicht : etw zu tun[/c] with......
, -
(Q)
Absicht ; etw zu tun[/c] with......
, -
(R)
Absicht . etw zu tun[/c] with......
, -
(S)
Absicht > etw zu tun[/c] with......
,
or anything else ?
Once you’ve decided of the right syntax which must be matched, in search and rewriten, in replacement, it should be easy to get the right second S/R needed ;-))
BR
guy038
-
-
I’m sorry if my tone seemed to you demanding. When I write something here, just like I’m doing right now, I’m more focused on, struggling with what I would like to express - be it a question, asking favour, or something else. Since English is not my mother language, please don’t assume that I have full control over English because I don’t.
But you are aware that you have the option to simply ignore my requests here, right? - like hundreads, if not thousands, of registered users do, except, well a few. You might want to make more of that option. :)
Finally, a word on this “famous” learning - would you attempt to learn the language of your foreign crush just to send a few SMS to her once in a blue moon? I mean learning a foreign language for the sake of sending a few messages? My post history is accessible, the amount and frequency of my requests are fair, if not low and rare, I believe.
Cheers,
glossar -
@glossar ,
We choose not to ignore because we want you to learn. How rude of us.
As an analogy, I cannot draw very well. If I asked a friend of mine who was good at drawing to make a sketch of a landscape for me, he’d probably do it out of friendship. If I took a look at it and said, “that looks nice, but could you move the mountain from the left to the right, and add a barn over here?”, And then he obliges. And then I say, “let’s add a tree over there, and a road going from the barn off the picture to the right”, he would not be out of line to suggest that I should start practicing drawing myself, because he is not my free sketch-generating service. I might whine back, “but it’s hard to learn how to sketch well”; he could then reply, “that’s true; but it’s worth it in the end, if you get your sketches without waiting for me”
Whether you are willing to make the effort of learning a language to text your girlfriend, or learn how to sketch your landscapes yourself, or learn how to craft your own regexes is up to you. But don’t be surprised if your girlfriend leaves you, your artist friend stops sketching for you, or the regex experts in this forum get tired of doing your work for you.
Good luck with your future endeavors.
-
Thank you for the effort, I appreciate it!
If I shouldn’t sound demanding :D, I would say I would expect all and none of them. There are thousands of entries in the file, with possibly ten thousands of tildes - I can’t go through all of them visually and manually, hence I can’t know or predict what stays before and after (each) tilde(s). As I said, you can’t ruin anything since they are not part of anything in the file. The regex should therefore replace a tilde with the headword exactly there, where it finds it - if there are zero or one or multiple spaces, letters, chars or whatever before or after a tilde, let it be, it should be replaced regardless.
Cheers,
glosaa -
Hi, @glossar, @terry-r, @alan-kilborn, @peterjones and All,
Thanks for your reply. So the second regex S/R should simply replace :
- A tilde character, in the expression space character + backslash character (
\
) + tilde character (~
)
with :
- The
headword
of the current card section, only, without any additional char, either before or after
Note, @glossar, that this formulation means this kind of replacement :
...nicht meine \Absicht![/c] it was an accident!, I didn't mean to do it!; [c teal]es war schon immer seine ...immer seine \Absicht, reich zu werden[/c] it has always been his goal to be rich; [c teal]das lag nicht in meiner ...in meiner \Absicht[/c] that was definitely not what I intended; [c teal]mit den besten ...den besten \Absichten[/c] with the best of intentions; [c teal]ernste ...[c teal]ernste \Absichten haben[/c] to have honourable \[[i]or[/i] [c sienna]AM[/c] -orable] intentions; [c teal]verborgene ...[c teal]verborgene \Absichten[/c] hidden intentions; [c teal]die ...[c teal]die \Absicht haben, etw zu tun[/c] to have the intention of doing sth; [c teal]in selbstmörderischer ...elbstmörderischer \Absicht[/c] with the intention of killing herself/himself; [c teal]\~ sein[/c] to be intentional; [c teal]in der ...in der \Absicht, etw zu tun[/c] with a view to \[[i]or[/i] the intention of] doing sth; [c teal]er verfolgte sie in der ...in der \Absicht, sie zu berauben[/c] he followed her with intent to rob her; [c teal]eine ...[c teal]eine \Absicht verfolgen[/c] to pursue a goal; [c teal]mit/ohne ...mit/ohne \Absicht[/c] intentionally/unintentionally[/m] ...[c teal]ein \Kaminfeuer machen[/c] to light the fireplace[/m]
where the headword
Absicht
will be inserted with these different syntaxes :Absicht verfolgen[/c] to Absicht![/c] it was an Absicht, etw zu tun[/c] Absicht[/c] intentionally Absichten haben[/c] to have Absichten[/c] hidden intentions;
If you agree, to this general format, which, BTW, simplifies the second regex, here is a summary of the final solution :
-
This first regex S/R, as said before :
-
SEARCH
(?s)^(\w+).+?\K(?=^\w+|(\Z))
-
REPLACE
(?2\r\n)\1\r\n
-
Adds each headword at the end of each card section
-
Then, this second regex S/R :
-
SEACRH
(?s)\x20\\\K~(?=.+?^(\w+))|^\w+\R(?=\w+|\Z)
-
REPLACE
?1\1
-
Replaces any tilde character, in the expression “space char + the
\~
” string, with the current headwordDeletes any headword, temporarily inserted at the end of each card section
IMPORTANT : for these two S/R :
-
You must tick the
Wrap around
option -
You MUST use the
Replace All
button, ONLY ( Do not use theReplace
button at all ! ). This is because of the\K
syntax in the regexes
Best Regards,
guy038
- A tilde character, in the expression space character + backslash character (
-
Hi guy,
Thank you so much! I stand corrected. Having struggled with expressing anything of any number before and after the tilde, I have forgotten to add the exception of backslash - my bad! All other synaxes are fine. I have just thrown a glance at the file on which I applied your previous regexes rev. #2 (namely the second pair, the one just before the last one - for the sake of clarity), and realized that it has also put (or kept) backslashes before the headword, so does this last one as well. But I could live with it because this time a backslash can’t be a part of a word or a group of words outside the file (code, whatever), I can’t possibly ruin anything by deletin it where it is, so I can simply S/R for it and get rid of it.
Thanks a million for your patience and your help!
Greetings,
glossar -
Hi, @glossar,
Ah… OK ! If you want to get rid of the
\
character, as well, just use this modified version of the second regex :-
SEARCH
(?s)\x20\K\\~(?=.+?^(\w+))|^\w+\R(?=\w+|\Z)
-
REPLACE
?1\1
BR
guy038
-
-
Hi guy!
Thank you for effort.
FYI, I’ve applied this last one as well as the second one on another file. Sadly, both didn’t work well, leaving 10K-20K tildes behind.
Greetings,
glossar