How can I change all the words in a given structure?



  • Hello everyone, I have a question again.

    I will do bulk text translation on Google. I noticed that Google is not translating words and sentences that are compound but separated by underscores. Phrases and words that I do not want to be translated in my document […] are in these brackets. With Notepad ++, I want to add bulk underscores “_” to the beginning, end and spaces of the sentence between these brackets. How can I do that?

    To summarize […] I want to add bulk underscores, including spaces, at the beginning and end of words and sentences in these parentheses.

    Sample;
    Text to be translated: [Help me?]
    With Notepad ++, I want to: [_Help_me? _]



  • @darkenb

    If I understand it correctly, this should do the trick
    find what: (?<=[).+?(?=])
    replace with: _$0 _



  • @Ekopalypse said in How can I change all the words in a given structure?:

    @darkenb

    If I understand it correctly, this should do the trick
    find what: (?<=[).+?(?=])
    replace with: _$0 _

    No, this code doesn’t work.

    #. “TEST [Test Test]: 0”
    Test “Test Test”
    Test “”

    This is the above code block structure in my text document. There are about 30,000 lines and they all have the same structure as 3 lines in the same way. I will translate the text in “…” quotes on line 2 to another language with Google Translate. However, I need to edit the [Test Test] section in the first line as [Test_Test]. So Google Translate does not recognize and translate this edit because “_” is underscore. In summary […] since I don’t want the text in this code block to be translated, I need to edit it as I mentioned. So the arrangement I want to do should be as follows.

    #. “TEST [_Test_Test _]: 0”
    “Test Test” test
    Scale “”

    The code you provide works as follows.

    ._ _ “TEST [Test Test]: 0”

    Test “Test Test”
    Test “”



  • @darkenb

    I would do it in three steps.
    I’m sure one could put a lot of thought into it and get it into one step, but why waste the time?

    I would do a Normal Search Mode replacement of [ with [_ followed by a replacement of ] with _].

    Then I would employ the technique (in Regular expresssion Search mode) shown HERE where:

    • BSR = bs[_
    • ESR = _bs]
    • FR is a space
    • RR = _

    Note: substitute bs in the above with \ !



  • @Alan-Kilborn

    Can you show me an example usage?

    Find what: What will I write in this part?
    Replace with: What do I write in this part?



  • @darkenb

    It’s really just a formula substitution from the other posting with the values above. All the information needed is there. Can’t you at least try that on your own?



  • @Alan-Kilborn said in How can I change all the words in a given structure?:

    @darkenb

    It’s really just a formula substitution from the other posting with the values above. All the information needed is there. Can’t you at least try that on your own?

    I don’t quite understand, so I don’t think I can do it. I tried something but it didn’t happen.

    Thanks anyway. I think I’ll try to find another way.



  • @darkenb ,

    I don’t quite understand, so I don’t think I can do it

    Which part didn’t you understand?

    I tried something but it didn’t happen.

    Did you try “fred” as the search and “george” as the replacement? No, I didn’t think so. But if you don’t tell us what you did try, we cannot help you.

    Alan’s description said

    1. replace [ with [_:
      209de5d2-283d-4e67-922b-adab3a50f5d9-image.png

    2. replace ] with _]:
      c56c2291-888c-454e-a134-b413e0f658c6-image.png

    3. Do the fancy regex using the “generic expression”:

      • the “generic expression” is found https://community.notepad-plus-plus.org/topic/20728/changing-data-inside-xml-element/15, which is the post that Alan linked to. However, sometimes the forum doesn’t take you directly to the right post, even though it’s supposed to. The link https://community.notepad-plus-plus.org/post/62799 might work better. If you still have trouble finding the right post, look for the post starting like:

        3cae6012-9ef6-45af-add3-b4f4deb0949d-image.png

        (Note that because of timezone differences, the timestamp you see might be different)

      • Alan’s post gave the values to insert. Because of a bug in the forum, showing \[ is more difficult than it should be, so he used bs to indicate \. I will try to do it without the red text, which sometimes makes \[ easier to show:

        • BSR = \[_
        • ESR = _\]
        • FR = a space character
        • RR = _
          These are then a list of “variables” to be substituted into the generic expression.
      • What is the generic expression? From the post that was linked,

        e51ea53c-0503-47ff-b9f6-d7c7370cfc9e-image.png

        • You would then put the value of BSR that Alan and I gave above into the expression in the image, and similarly for ESR, FR, and RR.
        • Finally, you would enable “regular expression” and run this final regular expression:
          1706e5aa-372c-4f09-b339-766cfca2a74f-image.png

    We try to be helpful, but a lot of the questions reduce to the same formula, where the question boils down to “I need to find FR and replace with RR, but only between BSR and ESR” – the questions of course don’t use those “variable names”, but the idea is the same. Our regex experts put a lot of effort into building a regex that meets those needs in the generic situation; when we try to help someone whose question boils down to that generic circumstance, we try to point them to the generic formula, and often give them the values for the variables, and expect that they can plug those values into the formula.

    But this means that we do expect the people who are asking questions to put in a little effort – if we just hand them the end product (like I eventually did for you), the chances are that they won’t learn how to insert different values into the same generic formula, so they won’t be able to solve the similar problem in the future, even though the answer is effectively the same. I decided to just hand you the answer, because this is only your first second question here… but I’m hoping you read the detailed instructions and figure out what it means, rather than just blindly taking the end result, and not bothering to figure out how it works.

    If you don’t understand the instructions, then you can ask specific questions. But saying “I tried something but it didn’t happen” is like telling your doctor, “Hey, Doc, it hurts when I do something.” If you don’t tell your doctor what you did and where it hurts, he cannot help you; similarly, if you don’t tell us exactly what you tried and what did or did not happen, we cannot help you.

    ----

    Do you want regex search/replace help? Then please be patient and polite, show some effort, and be willing to learn; answer questions and requests for clarification that are made of you. All example text should be marked as literal text using the </> toolbar button or manual Markdown syntax. To make regex in red (and so they keep their special characters like *), use backticks, like `^.*?blah.*?\z`. Screenshots can be pasted from the clipboard to your post using Ctrl+V to show graphical items, but any text should be included as literal text in your post so we can easily copy/paste your data. Show the data you have and the text you want to get from that data; include examples of things that should match and be transformed, and things that don’t match and should be left alone; show edge cases and make sure you examples are as varied as your real data. Show the regex you already tried, and why you thought it should work; tell us what’s wrong with what you do get. Read the official NPP Searching / Regex docs and the forum’s Regular Expression FAQ. If you follow these guidelines, you’re much more likely to get helpful replies that solve your problem in the shortest number of tries.



  • @darkenb

    #begin_spoon_feeding

    From the linked posting:

    SEARCH (?-i:BSR|(?!\A)\G)(?s:(?!ESR).)*?\K(?-i:FR)
    REPLACE RR

    Substituting in our values for BSR/ESR/FR/RR, that I enumerated above, yields:

    SEARCH (?-i:bs[_|(?!\A)\G)(?s:(?!_bs]).)*?\K(?-i: )
    REPLACE _

    Note: this site has trouble with backslash followed by [ or ], so the above still uses bs for backslash, but when put correctly into Notepad++ it looks like this in the Find what box:

    b7e354b7-2c1c-4f4b-a51a-23105e0e12ad-image.png

    And running a Replace All results in:

    559ab054-0786-4c74-8ad4-d141efd1aff9-image.png

    #end_spoon_feeding

    EDIT: Well, while I was composing, Peter was also getting out the spoon. Since I went to the trouble of composing, I’ll leave it.



  • @darkenb said in How can I change all the words in a given structure?:

    I don’t quite understand, so I don’t think I can do it. I tried something but it didn’t happen.
    Thanks anyway. I think I’ll try to find another way.

    This is why creating a help area of general problems and their general solutions on this site probably would be a wasted effort.
    People don’t want this, they just want the answer to their specific “thing”.
    No thinking required.
    Plug n play.



  • @Alan-Kilborn said in How can I change all the words in a given structure?:

    @darkenb

    #begin_spoon_feeding

    From the linked posting:

    SEARCH (?-i:BSR|(?!\A)\G)(?s:(?!ESR).)*?\K(?-i:FR)
    REPLACE RR

    Substituting in our values for BSR/ESR/FR/RR, that I enumerated above, yields:

    SEARCH (?-i:bs[_|(?!\A)\G)(?s:(?!_bs]).)*?\K(?-i: )
    REPLACE _

    Note: this site has trouble with backslash followed by [ or ], so the above still uses bs for backslash, but when put correctly into Notepad++ it looks like this in the Find what box:

    b7e354b7-2c1c-4f4b-a51a-23105e0e12ad-image.png

    And running a Replace All results in:

    559ab054-0786-4c74-8ad4-d141efd1aff9-image.png

    #end_spoon_feeding

    EDIT: Well, while I was composing, Peter was also getting out the spoon. Since I went to the trouble of composing, I’ll leave it.

    All right, this worked.

    I did not understand the complicated part in stage 2, now I have figured it out.

    Thanks to everyone who helped.



  • All right, this worked. Now I’ve added _ underscores to texts that I don’t want translated and it worked. Now I will restore the texts that I have added _ underscores, so I have to delete the ones I added. Only […] underscores in these parentheses will be removed. Any other _ hyphens used in the document will not change.



  • @darkenb said in How can I change all the words in a given structure?:

    Only […] underscores in these parentheses will be removed. Any other _ hyphens used in the document will not change.

    And now this is a trivial thing to do!



  • @Alan-Kilborn said in How can I change all the words in a given structure?:

    @darkenb said in How can I change all the words in a given structure?:

    Only […] underscores in these parentheses will be removed. Any other _ hyphens used in the document will not change.

    And now this is a trivial thing to do!

    I don’t understand why something trivial?

    This code block belongs to a game. I added this character “" so that Google Translate wouldn’t detect it and it worked. But it won’t work in the game if I don’t restore it. So I have to remove "” from […] these characters so that the text remains constant.



  • @darkenb said in How can I change all the words in a given structure?:

    I don’t understand why something trivial?

    Because you can just reapply the technique used earlier, with different values.
    In fact, I think it is just as simple as swapping your FR and RR values.



  • Hello, @darkenb, @alan-kilborn, @peterjones, @ekopalypse and All,

    I supposed, that, with the Alan’s and Peter’s explanations, you succeeded to achieve what you want !

    However, in all that story, there is still something unclear !

    @darkenb, I do understand that you don’t want the first line, beginning with a # character, to be translated by Google Translate and that the way you’ve found, to avoid translating, is to add a underscore characters ( _ ) between words ! But, please, could you be a bit more accurate ?

    Before the replacement process, is your text as like the B1, B2, B3 or B4 type ?

    B1 #. "This is an [ABC DEF GHI JKL] example of text: 0"
    B2 #. "This is an [ABC DEF GHI JKL ] example of text: 0"
    B3 #. "This is an [ ABC DEF GHI JKL] example of text: 0"
    B4 #. "This is an [ ABC DEF GHI JKL ] example of text: 0"
    

    After the replacement process, do you expect the text A1, A2, A3, A4, or A5 ?

    A1 #. "This is an [ABC_DEF_GHI_JKL] example of text: 0"
    A2 #. "This is an [_ABC_DEF_GHI_JKL_] example of text: 0"
    A3 #. "This is an [_ABC_DEF_GHI_JKL _] example of text: 0"
    A4 #. "This is an [_ ABC_DEF_GHI_JKL_] example of text: 0"
    A5 #. "This is an [_ ABC_DEF_GHI_JKL _] example of text: 0"
    

    To my mind, the more logical version is :

    • You, presently, have the B1 configuration

    • You would like to get the A1 or, may be, the A2 configuration, after the replacement process

    Just tell me about it !

    As always, once the problem is well defined, the solution is more easy to guess and halfway there ;-))

    Best regards,

    guy038



  • @guy038
    Yes, you got it right. A2 is exactly what I want.

    However, when the process is completed, that is, when I complete the translation, I have to do the opposite of this process.

    That’s why I need to remove _ underscores. However, there are _ lines in other words on the page. Therefore, only the underscores in the […] brackets should be removed in the same way, without changing the sentence. So in summary, I need to apply the following structure.

    Before: B1 #. “This is an [ABC DEF GHI JKL] example of text: 0”

    After: A2 #. “This is an [ABC_DEF_GHI_JKL] example of text: 0”

    More then: B1 #. “This is an [ABC DEF GHI JKL] example of text: 0”

    I’ll be glad if you help. I do not understand much, I would appreciate it if you show me practical.



  • A2, the wrong leading signs are erasing for me. It will be B1 first, then A2, then B1 again.



  • @darkenb said in How can I change all the words in a given structure?:

    It will be B1 first, then A2, then B1 again.

    Which really isn’t anything different than described before.
    And which already has a successful solution.

    I don’t really know what @guy038 would additionally supply.
    Probably some over-complicated single-step way(s) to do it, which likely would totally eliminate any possible learning opportunity, for an obvious newbie?

    I mean, we understand the newbie thing, but is it that hard to follow a recipe?
    Maybe someone needs to create a script that would walk one through the process of building up a regex for the “replace only inside delimiters” scenario? OK, I’ll give that a go, and post back here.



  • Hello, @darkenb,

    OK ! So, you start with text of style B1

    B1 #. "This is an [ABC DEF GHI JKL] example of text: 0"
    

    But, the case A# is still not defined, yet. Indeed, you said :

    After: A2 #. “This is an [ABC_DEF_GHI_JKL] example of text: 0”

    but, according to my classification, this should be A1 ?


    So, sorry to repeat, but are you expecting the style A1 or A2, below ?

    A1 #. "This is an [ABC_DEF_GHI_JKL] example of text: 0"
    A2 #. "This is an [_ABC_DEF_GHI_JKL_] example of text: 0"
    

    BR

    guy038


Log in to reply