• Login
Community
  • Login

Delete Chinese text after comparing two XML documents

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
31 Posts 3 Posters 5.7k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M
    Meta Chuh moderator @andrecool-68
    last edited by Meta Chuh Feb 24, 2019, 11:52 AM Feb 24, 2019, 11:51 AM

    @andrecool-68
    let’s combine @guy038 's regex with mine, and it should work on your second example:

    find what: "[\x{3000}-\x{303F}\x{4E00}-\x{9FEF}](.*?)\|
    replace with: "

    note: this new regex will only work if the first character after a " is chinese

    @guy038
    your original regex leaves out the | which should also be removed.

    1 Reply Last reply Reply Quote 2
    • A
      andrecool-68 @guy038
      last edited by Feb 24, 2019, 11:53 AM

      @guy038

      This removes all Chinese characters, but the fact is that the phrase starts with this character = " and ends with this character |
      And along with the Chinese symbols there may be other symbols including Latin.

      <DoSaveOrNot title=“儲存檔案|Сохранение” message=“您要儲存「$STR_REPLACE$」嗎?|Сохранить файл “$STR_REPLACE$”?” diff:status=“modified”/>

      M 1 Reply Last reply Feb 24, 2019, 11:57 AM Reply Quote 3
      • M
        Meta Chuh moderator @andrecool-68
        last edited by Meta Chuh Feb 24, 2019, 11:59 AM Feb 24, 2019, 11:57 AM

        @andrecool-68

        yes, you are correct, sorry.

        please try this modified one:

        find what: "(.*?)[\x{3000}-\x{303F}\x{4E00}-\x{9FEF}](.*?)\|
        replace with: "

        with this one any text can start, end, or have latin characters in between.
        text will only be removed if chinese is somewhere, anywhere between " and |

        1 Reply Last reply Reply Quote 2
        • A
          andrecool-68
          last edited by Feb 24, 2019, 12:10 PM

          @Meta-Chuh

          It turned out very well here.

          			<Item id="Режим Поиска" diff:status="modified"/>
          			<Item id="Обычный" diff:status="modified"/>
          			<Item id="Регуляр. выражен." diff:status="modified"/>
          			<Item id="Расширенный (\n, \r, \t, \0, \x...)" diff:status="modified"/>
          			<Item id="и новые строки" diff:status="modified"/>
          		</FindInFinder>
          		<SHA256FromFilesDlg title="從檔案產生 SHA-256" diff:status="removed">
          			<Item id="1922" name="從檔案產生 SHA-256..." diff:status="removed"/>
          			<Item id="1924" name="複製到剪貼簿" diff:status="removed"/>
          			<Item id="2" name="關閉" diff:status="removed"/>
          		</SHA256FromFilesDlg>
          

          It didn’t work out right here.

          		<FileTooBigToOpen title="Проблема с размером файла"Файл слишком велик, чтобы открыть его в Notepad++" diff:status="modified"/>
          		<CreateNewFileOrNot title="Создание нового файла"&quot;$STR_REPLACE$&quot; не существует. Создать его?" diff:status="modified"/>
          		<CreateNewFileError title="Создание нового файла"Не удается создать файл &quot;$STR_REPLACE$&quot;." diff:status="modified"/>
          		<OpenFileError title="ОШИБКА"Невозможно открыть файл &quot;$STR_REPLACE$&quot;." diff:status="modified"/>
          
          M 1 Reply Last reply Feb 24, 2019, 12:24 PM Reply Quote 1
          • M
            Meta Chuh moderator @andrecool-68
            last edited by Meta Chuh Feb 24, 2019, 12:29 PM Feb 24, 2019, 12:24 PM

            @andrecool-68

            one more ;-)

            find what: ="(.*?)[\x{3000}-\x{303F}\x{4E00}-\x{9FEF}](.*?)\|
            replace with: ="

            i’ve now added = to the search, because the old regex started at the end " instead of the beginning " under certain circumstances.

            note: i’m not a regex guru like @guy038 and i obviously underestimated how many conditions you have to think of in your head, before writing a regex.
            i’m more of the “trial and error” dude ;-)

            A 1 Reply Last reply Feb 24, 2019, 12:56 PM Reply Quote 0
            • A
              andrecool-68
              last edited by Feb 24, 2019, 12:29 PM

              It is very bad that there is no utility to work with the localization of Notepad++, a simple comparison of files does not always give a good result. Since the strings in the XML language files do not go in their order. This can be seen by comparing Chinese and Russian.

              1 Reply Last reply Reply Quote 1
              • A
                andrecool-68 @Meta Chuh
                last edited by Feb 24, 2019, 12:56 PM

                @Meta-Chuh
                Here’s what happened here:

                      <Item menuId="&amp;Файл"/>
                      <Item menuId="&amp;Правка"/>
                      <Item menuId="По&amp;иск"/>
                      <Item menuId="&amp;Вид"/>
                      <Item menuId="&amp;Кодировки"/>
                      <Item menuId="&amp;Синтаксисы"/>
                      <Item menuId="&amp;Опции"/>
                      <Item menuId="Инстр&amp;ументы"/>
                      <Item menuId="&amp;Макросы"/>
                      <Item menuId="&amp;Запуск"/>
                      <Item idName="Плаги&amp;ны"/>
                      <Item idName="Вкл&amp;адки"/>
                

                And should be so:

                				<Item menuId="file" name="&amp;Файл"/>
                				<Item menuId="edit" name="&amp;Правка"/>
                				<Item menuId="search" name="По&amp;иск"/>
                				<Item menuId="view" name="&amp;Вид"/>
                				<Item menuId="encoding" name="&amp;Кодировки"/>
                				<Item menuId="language" name="&amp;Синтаксисы"/>
                				<Item menuId="settings" name="&amp;Опции"/>
                				<Item menuId="tools" name="Инстр&amp;ументы"/>
                				<Item menuId="macro" name="&amp;Макросы"/>
                				<Item menuId="run" name="&amp;Запуск"/>
                				<Item idName="Plugins" name="Плаги&amp;ны"/>
                				<Item idName="Window"  name="Вкл&amp;адки"/>
                

                Probably regular expressions will not be able to fix the mess that is made xmlTreeNav. Need to look for an alternative to xmlTreeNav (this is not a good tool for XML localization)

                1 Reply Last reply Reply Quote 1
                • G
                  guy038
                  last edited by guy038 Feb 24, 2019, 1:06 PM Feb 24, 2019, 12:58 PM

                  Hi, @andrecool-68, @Meta-chuh and All,

                  @meta-chuh :

                  Ah, yes ! Your last attempt, adding the = sign is the right one because your former regex deleted the message = part !

                  @Andrecool-68 :

                  Now, I’ve got the problem : you want to delete the part of values, of the two attributes title and message, after the =" string, till the nearest | character included, but only if  this range contains, at least, one Chinese character ;-))

                  So, the following regex S/R :

                  • SEARCH (?-s)\x20(title|message)="\K.*?[\x{3000}-\x{303F}\x{4E00}-\x{9FEF}].*?\|

                  • REPLACE Leave EMPTY

                  • Option Regular expression

                  • Option Wrap around, if necessary

                  • Click on the Replace All button, exclusively ( because of the \K syntax )

                  Et voilà !

                  I tested the result of our two regexes, Chuh, and they do produce the same replaced text ;-))

                  Cheers,

                  guy038

                  A M 2 Replies Last reply Feb 24, 2019, 1:14 PM Reply Quote 2
                  • A
                    andrecool-68 @guy038
                    last edited by Feb 24, 2019, 1:14 PM

                    @guy038

                    You’ll laugh but it worked for this piece of code and left the rest unchanged.

                    1 Reply Last reply Reply Quote 1
                    • M
                      Meta Chuh moderator @guy038
                      last edited by Feb 24, 2019, 1:15 PM

                      @guy038

                      that’s why you are the guru ;-)

                      it is always astonishing how many things you are capable of thinking ahead so quickly.
                      like a game of chess, where you know the outcome of the game, even before a user makes the first draw.

                      this example looked so easy to me at first, but i have missed to think about so many things in advance, and this clearly shows the limits of my “trial and error” attempts.

                      ps: i thought you were skiing today … or are you writing from your mobile phone, while you are actually cruising down the slopes at the same time ? 😉👍

                      A 1 Reply Last reply Feb 24, 2019, 1:26 PM Reply Quote 2
                      • A
                        andrecool-68 @Meta Chuh
                        last edited by Feb 24, 2019, 1:26 PM

                        @Meta-Chuh

                        We have in Israel, plus 22 degrees Celsius , and it is very difficult in this weather to ski)))

                        M 1 Reply Last reply Feb 24, 2019, 1:31 PM Reply Quote 2
                        • A
                          andrecool-68
                          last edited by Feb 24, 2019, 1:27 PM

                          Thank you all very much for your help!

                          1 Reply Last reply Reply Quote 2
                          • M
                            Meta Chuh moderator @andrecool-68
                            last edited by Feb 24, 2019, 1:31 PM

                            @andrecool-68

                            We have in Israel, plus 22 degrees Celsius , and it is very difficult in this weather to ski)))

                            😂 i would like to have your temperatures and climate around here too, plus a cocktail on a beach, with chillout music and happy people all around 👍

                            1 Reply Last reply Reply Quote 2
                            • G
                              guy038
                              last edited by guy038 Feb 25, 2019, 10:44 AM Feb 24, 2019, 9:46 PM

                              Hi, @meta-chuh,

                              Please, not a “regex guru”. Only a guy who always keeps his eyes amazed at the important text changes that these little pieces of code can produce !


                              Sorry, Chuh, I did not write on my mobile phone while skiing ! Unlike young people, I still need a screen, with a fairly size to feel comfortable, while writing a post on our forum ;-))

                              Actually, though weather was marvelous, and probably due to my recent indisposition, I was a bit tired last afternoon and I just stopped, up there, with a colleague, … … … for a beer, looking at the nice panorama around ! But, we’ve planned an other ski-day, next Thursday, on the “Les Menuires - St Martin de Belleville” ski area !

                              Here is my modified “slopes map” picture, which shows the “Meribel - Les Menuires - Val Thorens” areas and part of Courchevel, on the left !

                              img

                              BR

                              guy038

                              1 Reply Last reply Reply Quote 4
                              • A
                                andrecool-68
                                last edited by Feb 25, 2019, 11:04 AM

                                I did all these operations in a row, there are only 5 lines that can be cleared manually. I made a macro of these regular expressions, everything turned out well!

                                "e name="(.*?)[\x{3000}-\x{303F}\x{4E00}-\x{9FEF}](.*?)\|
                                "e name="

                                " name="(.*?)[\x{3000}-\x{303F}\x{4E00}-\x{9FEF}](.*?)\|
                                " name="

                                name="(.*?)[\x{3000}-\x{303F}\x{4E00}-\x{9FEF}](.*?)\|
                                name="

                                " name="(.*?)[\x{3000}-\x{303F}\x{4E00}-\x{9FEF}](.*?)\|
                                " name="

                                title="(.*?)[\x{3000}-\x{303F}\x{4E00}-\x{9FEF}](.*?)\|
                                title="

                                message="(.*?)[\x{3000}-\x{303F}\x{4E00}-\x{9FEF}](.*?)\|
                                message="

                                value="(.*?)[\x{3000}-\x{303F}\x{4E00}-\x{9FEF}](.*?)\|
                                value="

                                Thank you very much for your efforts!

                                1 Reply Last reply Reply Quote 2
                                • A
                                  andrecool-68
                                  last edited by Feb 25, 2019, 11:19 AM

                                  Here one regular expression is mistakenly duplicated. But in the macro everything is correct.
                                  " name="(.*?)[\x{3000}-\x{303F}\x{4E00}-\x{9FEF}](.*?)\|
                                  " name="

                                  1 Reply Last reply Reply Quote 1
                                  • A
                                    andrecool-68
                                    last edited by Feb 25, 2019, 11:28 AM

                                    Need to test such option, and compare their work.
                                    (?-s)\x20(value|name|title|message)="\K.*?[\x{3000}-\x{303F}\x{4E00}-\x{9FEF}].*?\|

                                    1 Reply Last reply Reply Quote 3
                                    • G
                                      guy038
                                      last edited by guy038 Feb 25, 2019, 1:11 PM Feb 25, 2019, 12:57 PM

                                      Hi, @andrecool-68,

                                      I read your last posts with the different regeges and was about to suggest you such a regex, which combines all cases :-))

                                      An other syntax, which allows, the step by step replacement, with several clicks on the Replace button, would be :

                                      SEARCH (\x20(value|name|title|message)=").*?[\x{3000}-\x{303F}\x{4E00}-\x{9FEF}].*?\|

                                      REPLACE \1

                                      Cheers,

                                      guy038

                                      P.S. :

                                      BTW, don’t you have some ski resorts, in Israel ? I’m thinking of the Mount Hermon Ski Resort !

                                      A M 2 Replies Last reply Feb 25, 2019, 2:53 PM Reply Quote 2
                                      • A
                                        andrecool-68 @guy038
                                        last edited by Feb 25, 2019, 2:53 PM

                                        @guy038
                                        Small mountains we have in Israel but there is snow is very rare, and my friend flies every year to ski in Italy. And for me the best rest is fishing with a fishing rod.

                                        M 1 Reply Last reply Feb 25, 2019, 3:01 PM Reply Quote 2
                                        • M
                                          Meta Chuh moderator @guy038
                                          last edited by Meta Chuh Feb 25, 2019, 3:08 PM Feb 25, 2019, 2:56 PM

                                          @guy038

                                          Here is my modified “slopes map” picture, which shows the “Meribel - Les Menuires - Val Thorens” areas and part of Courchevel, on the left !

                                          this is beautiful 😃.
                                          next winter, as soon as my youngest son is old enough to have sufficient body control to enjoy it, i have to go skiing again, after a 3 year abstinence (far too long).

                                          thanks for sharing.
                                          short notices like that make this place pleasantly human to me, with a familiar atmosphere i enjoy. 👍

                                          ps:

                                          Please, not a “regex guru”

                                          may i use “regex master”, or “sensei regex san”, or “darth regex” instead ? ;-)

                                          1 Reply Last reply Reply Quote 1
                                          16 out of 31
                                          • First post
                                            16/31
                                            Last post
                                          The Community of users of the Notepad++ text editor.
                                          Powered by NodeBB | Contributors