Replace tilde (~) with headword



  • Hi all,
    I want to replace all tildes in a (dictionary) file with the respective headwords, under which they stay, since saving spaces with tilde in a digital (dictionary) file makes little sense.
    The file is UTF-8-BOM encoded and line feed is CRLF. For the sake of convention, lets call “headword and entry” a card, a dictionary card. The card structure is so:

    Absicht
    	[m0][c darkred][b]Absicht[/b][/c] [/m]
    	[m1][c darkslategray]<-, -en>[/c] [/m]
    	[m1][i][c darkslategray]f[/c] [/i] intention; [c teal]das war bestimmt nicht meine \~![/c] it was an accident!, I didn't mean to do it!; [c teal]es war schon immer seine \~, reich zu werden[/c] it has always been his goal to be rich; [c teal]das lag nicht in meiner \~[/c] that was definitely not what I intended; [c teal]mit den besten \~en[/c] with the best of intentions; [c teal]ernste \~en haben[/c] to have honourable \[[i]or[/i] [c sienna]AM[/c] -orable] intentions; [c teal]verborgene \~en[/c] hidden intentions; [c teal]die \~ haben, etw zu tun[/c] to have the intention of doing sth; [c teal]in selbstmörderischer \~[/c] with the intention of killing herself/himself; [c teal]\~ sein[/c] to be intentional; [c teal]in der \~, etw zu tun[/c] with a view to \[[i]or[/i] the intention of] doing sth; [c teal]er verfolgte sie in der \~, sie zu berauben[/c] he followed her with intent to rob her; [c teal]eine \~ verfolgen[/c] to pursue a goal; [c teal]mit/ohne \~[/c] intentionally/unintentionally[/m]
    
    

    Each headword stays betwen ^ and \r\n in one line, may or may not consist of multiple words, namely contain space(s), and also numbers, but nothing beyond [A-Za-z0-9] plus german letters (Umlauts, accents, etc) . Each entry starts with a tab (followed by a [ as char or [m\backslah\number] as a pattern) and ends with an [/m]CRLF. Entries may or may not include multiple lines (senses, definitions, usage examples, etc) and each line may or may not contain (multiple) tilde(s) and when the lines contain at least one tilde, that’s not necessarily in a consecutive order.

    This regex works (a bit) when operated on an isolated, single card:
    Find what: ^(.*?)\r\n\t(.*?)\~
    Replace with: $1\r\n\t$2$1
    with “. matches newline” selected.

    Drawbacks of it are that it can’t replace all tildes in a card (finds only the first occurrance, replaces it with the headword but just stops there) and that when a card doesn’t contain a tilde, it starts (to find from) the preceding card, which contains no tilde, includes/selects all such cards between until it reaches the card containing tilde(s). What’s worse, it replaces hence the tilde with the very first of the headword, entry of which contains no tilde. In the following example, only the third card contains tildes but the regex replaces it with the first headword (Kaminaufsatz):

    Kaminaufsatz
    	[m0][c darkred][b]Kaminaufsatz[/b][/c] [/m]
    	[m1][i][c darkslategray]m[/c] [/i] chimney pot[/m]
    Kaminbesteck
    	[m0][c darkred][b]Kaminbesteck[/b][/c] [/m]
    	[m1][i][c darkslategray]nt[/c] [/i] fireside companion set[/m]
    Kaminfeuer
    	[m0][c darkred][b]Kaminfeuer[/b][/c] [/m]
    	[m1][i][c darkslategray]nt[/c] [/i] open fire; [c teal]ein \~ machen[/c] to light the fireplace[/m]
    

    So, how to tell the regex that it should operate only within a card, namely that it should do what is to be done when it finds tilde(s) within a card, otherwise skip those cards containing no tilde, look for the next one containing tilde?

    Many thanks in advance!



  • Sorry, I meant these
    Find what: ^(\w+)(.*?)\~
    Replace with: $1$2$1



  • Hello, @glossar and All,

    I took me some time to understand what you wanted to achieve ! I found out a solution, which is, only a first version, as I may be wrong about your real needs !


    As the regex engine processes text from left to right, I had the idea of repeating the headword of each card section at the end of that section, for further treatment of each part between two \~ delimiters ! The current added headword is separated from the end of current line, with the ¤ sign. Of course, any other symbol, NOT used yet, in your cards dictionary, may be chosen !

    So given this sample text, where the last line must be followed with a true empty line :

    Absicht
    	[m0][c darkred][b]Absicht[/b][/c] [/m]
    	[m1][c darkslategray]<-, -en>[/c] [/m]
    	[m1][i][c darkslategray]f[/c] [/i] intention; [c teal]das war bestimmt nicht meine \~![/c] it was an accident!, I didn't mean to do it!; [c teal]es war schon immer seine \~, reich zu werden[/c] it has always been his goal to be rich; [c teal]das lag nicht in meiner \~[/c] that was definitely not what I intended; [c teal]mit den besten \~en[/c] with the best of intentions; [c teal]ernste \~en haben[/c] to have honourable \[[i]or[/i] [c sienna]AM[/c] -orable] intentions; [c teal]verborgene \~en[/c] hidden intentions; [c teal]die \~ haben, etw zu tun[/c] to have the intention of doing sth; [c teal]in selbstmörderischer \~[/c] with the intention of killing herself/himself; [c teal]\~ sein[/c] to be intentional; [c teal]in der \~, etw zu tun[/c] with a view to \[[i]or[/i] the intention of] doing sth; [c teal]er verfolgte sie in der \~, sie zu berauben[/c] he followed her with intent to rob her; [c teal]eine \~ verfolgen[/c] to pursue a goal; [c teal]mit/ohne \~[/c] intentionally/unintentionally[/m]
    Kaminaufsatz
    	[m0][c darkred][b]Kaminaufsatz[/b][/c] [/m]
    	[m1][i][c darkslategray]m[/c] [/i] chimney pot[/m]
    Kaminbesteck
    	[m0][c darkred][b]Kaminbesteck[/b][/c] [/m]
    	[m1][i][c darkslategray]nt[/c] [/i] fireside companion set[/m]
    Kaminfeuer
    	[m0][c darkred][b]Kaminfeuer[/b][/c] [/m]
    	[m1][i][c darkslategray]nt[/c] [/i] open fire; [c teal]ein \~ machen[/c] to light the fireplace[/m]
    
    

    The folowing regex S/R :

    SEARCH (?s)^(\w+).+?(?=\R\w+|\Z)

    REPLACE $0¤\1\r\n

    changes the sample as :

    Absicht
    	[m0][c darkred][b]Absicht[/b][/c] [/m]
    	[m1][c darkslategray]<-, -en>[/c] [/m]
    	[m1][i][c darkslategray]f[/c] [/i] intention; [c teal]das war bestimmt nicht meine \~![/c] it was an accident!, I didn't mean to do it!; [c teal]es war schon immer seine \~, reich zu werden[/c] it has always been his goal to be rich; [c teal]das lag nicht in meiner \~[/c] that was definitely not what I intended; [c teal]mit den besten \~en[/c] with the best of intentions; [c teal]ernste \~en haben[/c] to have honourable \[[i]or[/i] [c sienna]AM[/c] -orable] intentions; [c teal]verborgene \~en[/c] hidden intentions; [c teal]die \~ haben, etw zu tun[/c] to have the intention of doing sth; [c teal]in selbstmörderischer \~[/c] with the intention of killing herself/himself; [c teal]\~ sein[/c] to be intentional; [c teal]in der \~, etw zu tun[/c] with a view to \[[i]or[/i] the intention of] doing sth; [c teal]er verfolgte sie in der \~, sie zu berauben[/c] he followed her with intent to rob her; [c teal]eine \~ verfolgen[/c] to pursue a goal; [c teal]mit/ohne \~[/c] intentionally/unintentionally[/m]¤Absicht
    
    Kaminaufsatz
    	[m0][c darkred][b]Kaminaufsatz[/b][/c] [/m]
    	[m1][i][c darkslategray]m[/c] [/i] chimney pot[/m]¤Kaminaufsatz
    
    Kaminbesteck
    	[m0][c darkred][b]Kaminbesteck[/b][/c] [/m]
    	[m1][i][c darkslategray]nt[/c] [/i] fireside companion set[/m]¤Kaminbesteck
    
    Kaminfeuer
    	[m0][c darkred][b]Kaminfeuer[/b][/c] [/m]
    	[m1][i][c darkslategray]nt[/c] [/i] open fire; [c teal]ein \~ machen[/c] to light the fireplace[/m]¤Kaminfeuer
    

    As, you can see, each final line, of each card, ends with the string ¤HeadWord. So, successively, the values ¤Absicht, ¤Kaminaufsatz, ¤Kaminbesteck and ¤Kaminfeuer


    Now, with this second regex S/R, below, we globally search for the string \~, folowed the the shortest range of standard characters ( group 1 ) till an other string \~, excluded or the string ¤HeadWord ( case of the last line ) and rewrite the contents, between \~, in a new line, preceded with the current headword ( group 3 ) !

    SEARCH (?-s)\x20\\~,?\x20?(.+?)(?=(\x20\\~.+)?¤(.+))

    REPLACE \r\n\3\r\n\t\1

    Absicht
    	[m0][c darkred][b]Absicht[/b][/c] [/m]
    	[m1][c darkslategray]<-, -en>[/c] [/m]
    	[m1][i][c darkslategray]f[/c] [/i] intention; [c teal]das war bestimmt nicht meine
    Absicht
    	![/c] it was an accident!, I didn't mean to do it!; [c teal]es war schon immer seine
    Absicht
    	reich zu werden[/c] it has always been his goal to be rich; [c teal]das lag nicht in meiner
    Absicht
    	[/c] that was definitely not what I intended; [c teal]mit den besten
    Absicht
    	en[/c] with the best of intentions; [c teal]ernste
    Absicht
    	en haben[/c] to have honourable \[[i]or[/i] [c sienna]AM[/c] -orable] intentions; [c teal]verborgene
    Absicht
    	en[/c] hidden intentions; [c teal]die
    Absicht
    	haben, etw zu tun[/c] to have the intention of doing sth; [c teal]in selbstmörderischer
    Absicht
    	[/c] with the intention of killing herself/himself; [c teal]\~ sein[/c] to be intentional; [c teal]in der
    Absicht
    	etw zu tun[/c] with a view to \[[i]or[/i] the intention of] doing sth; [c teal]er verfolgte sie in der
    Absicht
    	sie zu berauben[/c] he followed her with intent to rob her; [c teal]eine
    Absicht
    	verfolgen[/c] to pursue a goal; [c teal]mit/ohne
    Absicht
    	[/c] intentionally/unintentionally[/m]¤Absicht
    
    Kaminaufsatz
    	[m0][c darkred][b]Kaminaufsatz[/b][/c] [/m]
    	[m1][i][c darkslategray]m[/c] [/i] chimney pot[/m]¤Kaminaufsatz
    
    Kaminbesteck
    	[m0][c darkred][b]Kaminbesteck[/b][/c] [/m]
    	[m1][i][c darkslategray]nt[/c] [/i] fireside companion set[/m]¤Kaminbesteck
    
    Kaminfeuer
    	[m0][c darkred][b]Kaminfeuer[/b][/c] [/m]
    	[m1][i][c darkslategray]nt[/c] [/i] open fire; [c teal]ein
    Kaminfeuer
    	machen[/c] to light the fireplace[/m]¤Kaminfeuer
    

    @glossar, are you expecting this kind of output ?


    Anyway, a last step is needed : to delete the temporary headwords, added after each ¤ symbol

    A child play with :

    SEARCH (?-s)¤.+

    REPLACE Leave EMPTY

    Best Regards,

    guy038



  • @glossar said in Replace tilde (~) with headword:

    This regex works (a bit) when operated on an isolated, single card:
    Find what: ^(.?)\r\n\t(.?)~
    Replace with: $1\r\n\t$2$1
    with “. matches newline” selected.

    I was intrigued by your initial regex and tested it to see where it failed you. In doing so I realized that @guy038’s solution actually added the “headword” (replacing the ~) on a new line. Your example and regex does not:
    9c16fe5a-7e90-45ff-b872-bb5d4d9993ac-image.png

    So I decided to see if I could alter your regex to prevent the issue where it continues from one card through the others until it DOES find a ~. I have succeeded, although it will still ONLY process 1 ~ in each card for each “replace all” button press. So assuming each card will have at most 20 ~ you will ONLY need to press the “replace all” button 20 times. So it processes the entire file, changing at MOST 1 ~ in each card where it exists for each press of the “replace all” button. This was only an exercise to see where I could provide you with another option similar to your own. @guy038’s solution does make more sense (and will be more correct when he amends it to not incude a newline) as far fewer steps to complete, mine was to show you how yours could be altered to fix the problem you had.
    Find What:(?-s)^([^\t].+)\R((?:\t[^~\r\n]+\R)*)(\t[^~\r\n]+)~
    Replace With:$1\r\n$2$3$1

    What I have done is force the requirement that any more lines checked MUST start with a \t. This prevents it crossing the boundaries into the next card if the current card does NOT contain ANY ~.

    Note that where you stated you ticked the box “. matches newline” this can be achieved by using the modifier (?s). In my solution I did NOT want that so I actually used the modifier (?-s). This will override ANY setting of the box you ticked. We often do that when supplying a solution as we have no idea what the requester may have set. Also note that even though I stated “. does NOT matches newline” (which is (?-s)) the use of a negative class [^~\r\n] had to include the \r\n otherwise it CAN run over newlines ([^~] means ANYTHING but the ~ and therefore includes \r and \n). Adding the modifier within a regex also provides the ability to mix the 2 types in the same regex. If you check:
    https://community.notepad-plus-plus.org/topic/18818/find-duplicate-html-tags-with-regex/5
    you will see some posts from our seasoned forum members discussing the use of both in a regex.

    Terry



  • @guy038

    Hello guy,

    Thank you for the attempt. I appreciate it.

    I’m sorry that I thought it was clear what I would wish to achieve. You know, in printed dictionaries some headwords are replaced with tilde expecialy in usage examples to save space. Suppose “once in a blue moon” is a headword in a dictionary, in a usage example like “I need regexes complicated this much once in a blue moon”, you would replace the “once in a blue moon” with a tilde, get “I need regexes complicated this much ~”, thus save space. You may be more familiar with the use of it in IT-field, use in file/folder directories comes to my mind: “\sys\var\bin~”.

    Now that you have dealt with it and come so far with your regex you might want to give it final touches and finish it? Although I don’t need it for this dictionary file anymore, I would like to keep it, should the need for future use arise.

    Thank you again.



  • @Terry-R

    Thank you so much, Terry. You were in the right direction.

    Someone has provided me with Perl script for this job, ended up running it for me on the file since I have zero knowledge about and experience with Perl. After I got the finished file from him, I deleted the old file. But I have created a prototype of the dictionary with the above example cards, mixing and repeating them a couple of times. Your regex seems to work, at least well on the isolated cards/in the prototype. Actually that’s what I wanted to see in the first place. You may still consider yours rough, unfinished, but as long as it works, it is final regex for me. I don’t mind clicking “Replace all” 30 or 40 times for the whole file.

    Just one point though: In the above examples, the entry of “Absicht” hast the most tildes: 10. So, I was expecting I would click on “Replace all” at most 10 ten times for the created prototype, but ended up with clicking 14 times. Is this okay or is my assumption/calculation wrong?

    Thank you so much again!
    glossar



  • @glossar said in Replace tilde (~) with headword:

    In the above examples, the entry of “Absicht” hast the most tildes: 10

    I actually counted 13. I suspect you doing it 14 times would be replacing on the first 13, then on the 14th you would have had a “no 0 occurrences replaced” in the Replace window. Is that correct?

    Terry



  • @Terry-R said in Replace tilde (~) with headword:

    @glossar said in Replace tilde (~) with headword:

    In the above examples, the entry of “Absicht” hast the most tildes: 10

    I actually counted 13. I suspect you doing it 14 times would be replacing on the first 13, then on the 14th you would have had a “no 0 occurrences replaced” in the Replace window. Is that correct?

    Terry

    Heck, why did I myself with my eyes and manually count them while Notepad can do it? :) You are right, it contains 13, and on the 14th would I get the message “nothing replaced” or something like that.

    Anyway, I conclude that yours works, so I’ll keep it for future use.

    Thank you so much again.
    Cheers,
    glossar



  • Hello, @glossar, @terry-r All,

    Yes, I admit that I did not catch the problem in the right way ! So, this is my second version to solve your problem which, fortunately, does not require successive clicks on the Replace button. Of course, now that the goal is well identified, it’s must easier to build the suitable regexes !

    With this second solution, the idea is :

    • Firstly, to rewrite each current headword, in a new line, above the next headword entry or at the very end of file

    • Secondly, to search any \~ string, considering, in a look-ahead regex structure, the temporary current headword, located at end of each card section


    So, let’s take, again, this initial sample :

    Absicht
    	[m0][c darkred][b]Absicht[/b][/c] [/m]
    	[m1][i][c darkslategray]f[/c] [/i] intention; [c teal]das war bestimmt nicht meine \~![/c] it was an accident!, I didn't mean to do it!; [c teal]es war schon immer seine \~, reich zu werden[/c] it has always been his goal to be rich; [c teal]das lag nicht in meiner \~[/c] that was definitely not what I intended; [c teal]mit den besten \~en[/c] with the best of intentions; [c teal]ernste \~en haben[/c] to have honourable \[[i]or[/i] [c sienna]AM[/c] -orable] intentions; [c teal]verborgene \~en[/c] hidden intentions; [c teal]die \~ haben, etw zu tun[/c] to have the intention of doing sth; [c teal]in selbstmörderischer \~[/c] with the intention of killing herself/himself; [c teal]\~ sein[/c] to be intentional; [c teal]in der \~, etw zu tun[/c] with a view to \[[i]or[/i] the intention of] doing sth; [c teal]er verfolgte sie in der \~, sie zu berauben[/c] he followed her with intent to rob her; [c teal]eine \~ verfolgen[/c] to pursue a goal; [c teal]mit/ohne \~[/c] intentionally/unintentionally[/m]
    	[m1][c darkslategray]<-, -en>[/c] [/m]
    Kaminaufsatz
    	[m0][c darkred][b]Kaminaufsatz[/b][/c] [/m]
    	[m1][i][c darkslategray]m[/c] [/i] chimney pot[/m]
    Kaminbesteck
    	[m0][c darkred][b]Kaminbesteck[/b][/c] [/m]
    	[m1][i][c darkslategray]nt[/c] [/i] fireside companion set[/m]
    Kaminfeuer
    	[m0][c darkred][b]Kaminfeuer[/b][/c] [/m]
    	[m1][i][c darkslategray]nt[/c] [/i] open fire; [c teal]ein \~ machen[/c] to light the fireplace[/m]
    

    Note that I, intentionally, swapped the lines [m1][i][c darkslategray]f[/c] [/i]........ and [m1][c darkslategray]<-, -en>[/c] [/m], in order that a line, containing the \~ syntax, does not end a card section !

    Then, the following regex S/R :

    SEARCH (?s)^(\w+).+?\K(?=^\w+|(\Z))

    REPLACE (?2\r\n)\1\r\n

    adds each headword at the end of each card section and gives :

    Absicht
    	[m0][c darkred][b]Absicht[/b][/c] [/m]
    	[m1][i][c darkslategray]f[/c] [/i] intention; [c teal]das war bestimmt nicht meine \~![/c] it was an accident!, I didn't mean to do it!; [c teal]es war schon immer seine \~, reich zu werden[/c] it has always been his goal to be rich; [c teal]das lag nicht in meiner \~[/c] that was definitely not what I intended; [c teal]mit den besten \~en[/c] with the best of intentions; [c teal]ernste \~en haben[/c] to have honourable \[[i]or[/i] [c sienna]AM[/c] -orable] intentions; [c teal]verborgene \~en[/c] hidden intentions; [c teal]die \~ haben, etw zu tun[/c] to have the intention of doing sth; [c teal]in selbstmörderischer \~[/c] with the intention of killing herself/himself; [c teal]\~ sein[/c] to be intentional; [c teal]in der \~, etw zu tun[/c] with a view to \[[i]or[/i] the intention of] doing sth; [c teal]er verfolgte sie in der \~, sie zu berauben[/c] he followed her with intent to rob her; [c teal]eine \~ verfolgen[/c] to pursue a goal; [c teal]mit/ohne \~[/c] intentionally/unintentionally[/m]
    	[m1][c darkslategray]<-, -en>[/c] [/m]
    Absicht
    Kaminaufsatz
    	[m0][c darkred][b]Kaminaufsatz[/b][/c] [/m]
    	[m1][i][c darkslategray]m[/c] [/i] chimney pot[/m]
    Kaminaufsatz
    Kaminbesteck
    	[m0][c darkred][b]Kaminbesteck[/b][/c] [/m]
    	[m1][i][c darkslategray]nt[/c] [/i] fireside companion set[/m]
    Kaminbesteck
    Kaminfeuer
    	[m0][c darkred][b]Kaminfeuer[/b][/c] [/m]
    	[m1][i][c darkslategray]nt[/c] [/i] open fire; [c teal]ein \~ machen[/c] to light the fireplace[/m]
    Kaminfeuer
    

    Note that you must use the Replace All button, ONLY ( Do not use the Replace button at all ! ), for this S/R


    Now, this second regex S/R :

    SEACRH (?s)\x20\\~,?\x20?(?=.+?^(\w+))|^\w+\R(?=\w+|\Z)

    REPLACE ?1\x20\\\1\x20

    • Replaces each \~ string, possibly followed with comma and/or space characters, with a space char, the \ symbol, then the current headword and a final space character

    • Deletes any headword, temporarily inserted at the end of each card section

    And you get your expected output :

    Absicht
    	[m0][c darkred][b]Absicht[/b][/c] [/m]
    	[m1][i][c darkslategray]f[/c] [/i] intention; [c teal]das war bestimmt nicht meine \Absicht ![/c] it was an accident!, I didn't mean to do it!; [c teal]es war schon immer seine \Absicht reich zu werden[/c] it has always been his goal to be rich; [c teal]das lag nicht in meiner \Absicht [/c] that was definitely not what I intended; [c teal]mit den besten \Absicht en[/c] with the best of intentions; [c teal]ernste \Absicht en haben[/c] to have honourable \[[i]or[/i] [c sienna]AM[/c] -orable] intentions; [c teal]verborgene \Absicht en[/c] hidden intentions; [c teal]die \Absicht haben, etw zu tun[/c] to have the intention of doing sth; [c teal]in selbstmörderischer \Absicht [/c] with the intention of killing herself/himself; [c teal]\~ sein[/c] to be intentional; [c teal]in der \Absicht etw zu tun[/c] with a view to \[[i]or[/i] the intention of] doing sth; [c teal]er verfolgte sie in der \Absicht sie zu berauben[/c] he followed her with intent to rob her; [c teal]eine \Absicht verfolgen[/c] to pursue a goal; [c teal]mit/ohne \Absicht [/c] intentionally/unintentionally[/m]
    	[m1][c darkslategray]<-, -en>[/c] [/m]
    Kaminaufsatz
    	[m0][c darkred][b]Kaminaufsatz[/b][/c] [/m]
    	[m1][i][c darkslategray]m[/c] [/i] chimney pot[/m]
    Kaminbesteck
    	[m0][c darkred][b]Kaminbesteck[/b][/c] [/m]
    	[m1][i][c darkslategray]nt[/c] [/i] fireside companion set[/m]
    Kaminfeuer
    	[m0][c darkred][b]Kaminfeuer[/b][/c] [/m]
    	[m1][i][c darkslategray]nt[/c] [/i] open fire; [c teal]ein \Kaminfeuer machen[/c] to light the fireplace[/m]
    

    Notes :

    • In search, the regex searches for a space character, followed with the \~ string, in the four cases a, b, c and d, below

    • In all cases, it is replaced with a space character, a backslash symbol, the current headword ( Group 1 = \1 ) and a final space char

    • When the second alternative, without any group, ^\w+\R(?=\w+|\Z) is matched, the temporarily current headword is simply deleted

                     SEARCH                    =>                   REPLACEMENT
    
         \x20     \\    ~     ,?      \x20?    =>    \x20    \\       \1        \x20    if Group 1
                                                                
    a    space    \     ~    comma    space    =>    space    \    headword    space    if Group 1
    b    space    \     ~    comma             =>    space    \    headword    space    if Group 1
    c    space    \     ~             space    =>    space    \    headword    space    if Group 1
    d    space    \     ~                      =>    space    \    headword    space    if Group 1
    
         CURRENT headword + EOL                =>    DELETED                            if NOT group 1
    
         ^    \w+           \R                 =>                EMPTY                  if NOT group 1
    

    Best Regards

    guy038



  • @guy038

    Thank you so much, guy! I really appreciate your help!

    I’ve quickly tested your regexes on another file, structurally very similar to the above-mentioned one. It seems you have done pretty decent job!

    Two points though:

    • With the second regex, am I supposed to click on “Replace all” once? Out of curiosity, I have clicked the second time, it made more replacements - at the third it stopped, I also confirmed searching for a (remained) tilde, found in fact none.
    • Knowing what you can do with regex, I can’t believe you didn’t made the before- and after-tilde variations more flexible, limited them to only four cases, as you demonstrated above. On this another file, there are indeed many tildes which don’t follow your above patterns, namely no space before or after the tilde. After applied your regexes, I performed for the remained ones the following : ``
      Find what: [backslash]~
      Replace with: \s[backslash]~
      Some are meant for example to be part of the headword (= endings, etc.):
      After replacement, I got: Leistung en (Plural of the headword) (notice the space betwen the “g” and “en”.)

    guy - Tilde can be part of nothing - neither of words, nor of tags, nor of codes, at least not on this file. So, you can’t possibly ruin anything with tilde-replacement. I mean, you could be more harsh on it. :)

    If you could adjust the regexes accordingly, that would be perfect. Otherwise I’m already happy with the current ones, and once again, thank you so much for the effort.

    Cheers,
    glossar



  • @glossar said in Replace tilde (~) with headword:

    Knowing what you can do with regex, I can’t believe you didn’t made the before- and after-tilde variations more flexible, limited them to only four cases, as you demonstrated above. On this another file, there are indeed many tildes which don’t follow your above patterns, namely no space before or after the tilde.

    So this isn’t a regex-writing service.
    You are supposed to learn from the gracious help that has been thus far provided to you, and adapt the solutions provided to new and similar problems.



  • @Alan-Kilborn

    Have I told or assumed that this is a regex-writing service? Let’s not speak English like that, shall we?



  • @glossar said in Replace tilde (~) with headword:

    Have I told or assumed that this is a regex-writing service?

    It sure seems like you’re doing this.
    And your tone seems a bit, well, demanding.



  • @glossar said in Replace tilde (~) with headword:

    Have I told or assumed that this is a regex-writing service?

    Your behavior, including the text that @Alan-Kilborn quoted, strongly suggested that you do effectively think of this as a regex writing service: otherwise, why ask for a new version of the regex when you had already admitted that the one already given to you worked?

    Basically, the way that your final request came across was equivalent to “what you gave me worked… but, since you are writing regexes for me anyway, could you also change the regex to handle more edge cases for me or do something differently, without me having to try to modify the regex on my own.”.

    If, instead, you had said something like: “Thanks for the regex, it works fine for the purposes described above. I was investigating whether I could change the regex to allow it to do the “tilde-replacement” I described earlier, and I tried modifying it to ....blah.... or ...something..., because I thought that the xxx portion of the regex would allow for the “tilde-replacement”; however, instead of getting what I expected, I got _____” (with actual regexes and sample data). If you had put in the effort to try to change it yourself, explained why you thought your changes would work, and explained how their results didn’t match your expectations, then at least you would have shown that you were trying to learn and apply what we’d already taught you, rather than requesting yet again to “write the regex for me”.

    The best way to stay in the good-graces of the regulars here who are helping you with regexes is to show that you are trying to learn and apply the examples already given, and branching off on your own. The best way to frustrate the regulars here who are trying to help is to keep on changing the requirements and asking for changes to the free regexes you’ve already been given, without even trying to make the changes yourself.



  • Hello, glossar, @terry-r, @alan-kilborn and All,

    With the first regex S/R provided in previous post, we’re able to insert the current headword, at the end of each card section, to refer to when further match of the \~ string, located after a space char. OK !

    So, now, let’s start with the regex \x20\\~ which matches a space character followed by the \ symbol and a ~ character and which must be replaced, somehow, with the current headword !

    Assuming this example, from the Absicht entry :

    \~, etw zu tun[/c] with a view to \[[i]or[/i] the intention of] doing sth; [c teal]er verfolgte sie in der

    @glossar, Which kind of output do you expect to ?

    • (A) \Absicht, etw zu tun[/c] with......, where everything after ~, till next word char, is just rewritten, as it is

    • (B) \Absicht etw zu tun[/c] with......, where everything after ~, till next word char, is replaced with a single space char

    • (C) \Absicht: etw zu tun[/c] with......, where everything after ~, till next word char, is replaced with : and a single space char

    • (D) \Absicht; etw zu tun[/c] with......, where everything after ~, till next word char, is replaced with ; and a single space char

    • (E) \Absicht. etw zu tun[/c] with......, where everything after ~, till next word char, is replaced with . and a single space char

    • (F) \Absicht> etw zu tun[/c] with......, where everything after ~, till next word char, is replaced with > and a single space char

    • (G) \Absichtetw zu tun[/c] with......, where everything after ~, till next word char, is simply deleted


    Do you prefer to delete, in replacement, the \ backslash, located before the ~, like below ?

    • (H) Absicht, etw zu tun[/c] with......,

    • (I) Absicht etw zu tun[/c] with......,

    • (J) Absicht: etw zu tun[/c] with......,

    • (K) Absicht; etw zu tun[/c] with......,

    • (L) Absicht. etw zu tun[/c] with......,

    • (M) Absicht> etw zu tun[/c] with......,

    • (N) Absichtetw zu tun[/c] with......,


    Do you need an extra space char, between the headword and the symbol, as below :

    • (O) Absicht , etw zu tun[/c] with......,

    • (P) Absicht : etw zu tun[/c] with......,

    • (Q) Absicht ; etw zu tun[/c] with......,

    • (R) Absicht . etw zu tun[/c] with......,

    • (S) Absicht > etw zu tun[/c] with......,

    or anything else ?

    Once you’ve decided of the right syntax which must be matched, in search and rewriten, in replacement, it should be easy to get the right second S/R needed ;-))

    BR

    guy038



  • @Alan-Kilborn @PeterJones

    I’m sorry if my tone seemed to you demanding. When I write something here, just like I’m doing right now, I’m more focused on, struggling with what I would like to express - be it a question, asking favour, or something else. Since English is not my mother language, please don’t assume that I have full control over English because I don’t.

    But you are aware that you have the option to simply ignore my requests here, right? - like hundreads, if not thousands, of registered users do, except, well a few. You might want to make more of that option. :)

    Finally, a word on this “famous” learning - would you attempt to learn the language of your foreign crush just to send a few SMS to her once in a blue moon? I mean learning a foreign language for the sake of sending a few messages? My post history is accessible, the amount and frequency of my requests are fair, if not low and rare, I believe.

    Cheers,
    glossar



  • @glossar ,

    We choose not to ignore because we want you to learn. How rude of us.

    As an analogy, I cannot draw very well. If I asked a friend of mine who was good at drawing to make a sketch of a landscape for me, he’d probably do it out of friendship. If I took a look at it and said, “that looks nice, but could you move the mountain from the left to the right, and add a barn over here?”, And then he obliges. And then I say, “let’s add a tree over there, and a road going from the barn off the picture to the right”, he would not be out of line to suggest that I should start practicing drawing myself, because he is not my free sketch-generating service. I might whine back, “but it’s hard to learn how to sketch well”; he could then reply, “that’s true; but it’s worth it in the end, if you get your sketches without waiting for me”

    Whether you are willing to make the effort of learning a language to text your girlfriend, or learn how to sketch your landscapes yourself, or learn how to craft your own regexes is up to you. But don’t be surprised if your girlfriend leaves you, your artist friend stops sketching for you, or the regex experts in this forum get tired of doing your work for you.

    Good luck with your future endeavors.



  • @guy038

    Thank you for the effort, I appreciate it!

    If I shouldn’t sound demanding :D, I would say I would expect all and none of them. There are thousands of entries in the file, with possibly ten thousands of tildes - I can’t go through all of them visually and manually, hence I can’t know or predict what stays before and after (each) tilde(s). As I said, you can’t ruin anything since they are not part of anything in the file. The regex should therefore replace a tilde with the headword exactly there, where it finds it - if there are zero or one or multiple spaces, letters, chars or whatever before or after a tilde, let it be, it should be replaced regardless.

    Cheers,
    glosaa



  • Hi, @glossar, @terry-r, @alan-kilborn, @peterjones and All,

    Thanks for your reply. So the second regex S/R should simply replace :

    • A tilde character, in the expression space character + backslash character ( \ ) + tilde character ( ~ )

    with :

    • The headword of the current card section, only, without any additional char, either before or after

    Note, @glossar, that this formulation means this kind of replacement :

    ...nicht meine \Absicht![/c] it was an accident!, I didn't mean to do it!; [c teal]es war schon immer seine
    ...immer seine \Absicht, reich zu werden[/c] it has always been his goal to be rich; [c teal]das lag nicht in meiner
    ...in meiner \Absicht[/c] that was definitely not what I intended; [c teal]mit den besten
    ...den besten \Absichten[/c] with the best of intentions; [c teal]ernste
    ...[c teal]ernste \Absichten haben[/c] to have honourable \[[i]or[/i] [c sienna]AM[/c] -orable] intentions; [c teal]verborgene
    ...[c teal]verborgene \Absichten[/c] hidden intentions; [c teal]die
    ...[c teal]die \Absicht haben, etw zu tun[/c] to have the intention of doing sth; [c teal]in selbstmörderischer
    ...elbstmörderischer \Absicht[/c] with the intention of killing herself/himself; [c teal]\~ sein[/c] to be intentional; [c teal]in der
    ...in der \Absicht, etw zu tun[/c] with a view to \[[i]or[/i] the intention of] doing sth; [c teal]er verfolgte sie in der
    ...in der \Absicht, sie zu berauben[/c] he followed her with intent to rob her; [c teal]eine
    ...[c teal]eine \Absicht verfolgen[/c] to pursue a goal; [c teal]mit/ohne
    ...mit/ohne \Absicht[/c] intentionally/unintentionally[/m]
    
    ...[c teal]ein \Kaminfeuer machen[/c] to light the fireplace[/m]
    

    where the headword Absicht will be inserted with these different syntaxes :

    Absicht verfolgen[/c] to
    Absicht![/c] it was an
    Absicht, etw zu tun[/c]
    Absicht[/c] intentionally
    Absichten haben[/c] to have
    Absichten[/c] hidden intentions;
    

    If you agree, to this general format, which, BTW, simplifies the second regex, here is a summary of the final solution :

    • This first regex S/R, as said before :

      • SEARCH (?s)^(\w+).+?\K(?=^\w+|(\Z))

      • REPLACE (?2\r\n)\1\r\n

    Adds each headword at the end of each card section

    • Then, this second regex S/R :

      • SEACRH (?s)\x20\\\K~(?=.+?^(\w+))|^\w+\R(?=\w+|\Z)

      • REPLACE ?1\1

    Replaces any tilde character, in the expression "space char + the \~" string, with the current headword

    Deletes any headword, temporarily inserted at the end of each card section


    IMPORTANT : for these two S/R :

    • You must tick the Wrap around option

    • You MUST use the Replace All button, ONLY ( Do not use the Replace button at all ! ). This is because of the \K syntax in the regexes

    Best Regards,

    guy038



  • @guy038

    Hi guy,

    Thank you so much! I stand corrected. Having struggled with expressing anything of any number before and after the tilde, I have forgotten to add the exception of backslash - my bad! All other synaxes are fine. I have just thrown a glance at the file on which I applied your previous regexes rev. #2 (namely the second pair, the one just before the last one - for the sake of clarity), and realized that it has also put (or kept) backslashes before the headword, so does this last one as well. But I could live with it because this time a backslash can’t be a part of a word or a group of words outside the file (code, whatever), I can’t possibly ruin anything by deletin it where it is, so I can simply S/R for it and get rid of it.

    Thanks a million for your patience and your help!

    Greetings,
    glossar


Log in to reply