Community

    • Login
    • Search
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search

    Regex - How to keep only some parts of huge text

    General Discussion
    idk regex help
    4
    8
    2159
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Tonda Ptáčník
      Tonda Ptáčník last edited by

      Hello,
      I’m have got a huge amount of text and I’m trying to keep just a few parts of each line.
      I tried to use [^(?=<![A-Za-z0-9 ?,?:?/?<?>?])]
      To only keep all text between <! and > but without succes. Problem is that It is ignoring all letters, etc everywhere.
      Here is the problem. Image I need to keep only the yellow text on all lines.

      Do you guys know how to make that?

      1 Reply Last reply Reply Quote 0
      • guy038
        guy038 last edited by guy038

        Hello, @tonda-ptáčník,

        Personally, in your picture I think that you’re speaking of the part of text, in a kind of color orange, <![CDATA[...........]]>, isn’t it ?

        If so :

        • Open the Replace dialog

        • Type in ^.*(<![CDATA[.*]>) in the Find what: zone

        • Type in \1 in the Replace with: zone

        • Tick the Wrap around option

        • Select the Regular expression search mode

        • Click on the Replace All button

        Best Regards,

        guy038

        Tonda Ptáčník 1 Reply Last reply Reply Quote 2
        • Tonda Ptáčník
          Tonda Ptáčník @guy038 last edited by

          @guy038 Thank you very much it helped me a lot.

          But now I have got another problem. Do you know how to replace a huge amount of text which will be different for each line?
          Like here is what I need:
          I need to edit this Image to look like this Image2 (I copy pasted this by hand, problem is that the file has over 6000 lines) Problem is that I have only a text for each line only with the translated sentences. So the file from which I wanna copy look like this: Image3

          Scott Sumner 1 Reply Last reply Reply Quote 0
          • Scott Sumner
            Scott Sumner @Tonda Ptáčník last edited by

            @Tonda-Ptáčník

            Suggestion: Why don’t you post text here instead of images?

            If someone is willing to help, having some text to experiment with is something that might make them dive in and actually help. Nobody is going to retype text into Notepad++ after looking at some images with text.

            The best way to post text here is to indent it with 4 spaces in Notepad++, then copy and paste here. Don’t make the amount of text too large or it won’t let you post it.

            1 Reply Last reply Reply Quote 1
            • guy038
              guy038 last edited by guy038

              Hi, @tonda-ptáčník,

              I understood what you want but I need some additional information :

              Are the lines, which are to be processed, only of these two forms, below ?

              <phrase title=......NOT changed.........<![CDATA[......To be TRANSLATED.......]]></phrase>
              

              OR

              <phrase title=........NOT changed......<![CDATA[.......To be TRANSLATED till the END of line ...............
              <br />
              </strong>......To be TRANSLATED.......</strong>]]></phrase>
              

              Remarks :

              • To just insert normal text without any markdown syntax

              Type, for instance :

              ~~~z
              this text will NOT be interpreted
              ~~~

              and it will be displayed as :

              this text will NOT be interpreted
              
              • If you want to insert text with XML highlighting, simply use :

              ~~~xml
              <!–
              Use MenuEntryName and MenuItemName to localize your commands to add.
              The values should be in English but not in translated language.
              (You can set Notepad++ language back to English from Preferences dialog via menu “Settings->Preferences…”)
              –>
              <Item MenuEntryName=“Edit” MenuItemName=“Cut”/>
              <Item MenuEntryName=“Edit” MenuItemName=“Copy”/>
              <Item MenuEntryName=“Edit” MenuItemName=“Paste”/>
              <Item MenuEntryName=“Edit” MenuItemName=“Delete”/>
              <Item MenuEntryName=“Edit” MenuItemName=“Select all”/>
              <Item MenuEntryName=“Edit” MenuItemName=“Begin/End Select”/>
              ~~~

              and you get :

              		<!-- 
              		Use MenuEntryName and MenuItemName to localize your commands to add. 
              		The values should be in English but not in translated language.
              		(You can set Notepad++ language back to English from Preferences dialog via menu "Settings->Preferences...")
              		-->
                      <Item MenuEntryName="Edit" MenuItemName="Cut"/>
                      <Item MenuEntryName="Edit" MenuItemName="Copy"/>
                      <Item MenuEntryName="Edit" MenuItemName="Paste"/>
                      <Item MenuEntryName="Edit" MenuItemName="Delete"/>
                      <Item MenuEntryName="Edit" MenuItemName="Select all"/>
                      <Item MenuEntryName="Edit" MenuItemName="Begin/End Select"/>
              

              Best Regards,

              guy038

              Tonda Ptáčník 1 Reply Last reply Reply Quote 0
              • Tonda Ptáčník
                Tonda Ptáčník @guy038 last edited by

                @guy038 As I looked into the xml, the lines are in more forms.
                Here is the xml which I need to edit https://url.cmgportal.cz/language-Cestina.xml

                1 Reply Last reply Reply Quote 0
                • guy038
                  guy038 last edited by guy038

                  Hi, @tonda-ptáčník, and All,

                  I’m terribly sorry, @tonda-ptáčník, for answering very late, but I was quite busy these last days ! In addition to some family events, I had to fix the tower desktop computer of a dance friend of my wife’s ! Operating system couldn’t start anymore.

                  Luckily, I could, first, save all the data ( 50 GB about ! ) on her laptop’s husband and, secondly, restore all data on a new computer, via the network’s box !

                  However, just notice that, on this new desktop computer, bought in a very well-known French super-market ( HP Pavilion Desktop PC 570-p015nf ) :

                  • We, first, had to install Windows 10, by ourselves, after calling an hot-line ! ( F11 key, at start-up )

                  • I was very surprised to realize that the majority of the hard disk space ( 1 Tb ) was NOT allocated !!?? ( C: ~ 50GB, D: = Recovery ~ 10Gb an other small partition, and 826 Go not allocated, yet ! ). So, with the disk manager, I created and formated this free space and moved all the user’s folders ( documents, pictures, downloads …) in this new partition

                  Really amazing how the final customer is treated when he doesn’t, necessarily, have all the technical skills to solve such things :-(( It’s a shame, really!


                  But, let’s get back to our subject. So, I began to study your XML file …

                  • I assume that title="......" attribute, right after <phrase must not be translated. Only, the zones <![CDATA[......]]> need to be translated in Czech language. Am I right about it ?

                  Except for the line <?xml ....> and two lines language ...... and </language>, your file contains 6,424 lines, divided into :

                  • 90 single-line ranges <phrase> ......</phrase> without any text, so <![CDATA[]]> ( Case A )

                  • 5,750 single-line ranges <phrase> ......</phrase>, with raw text, only, inside <![CDATA[......]]>, ( as, for instance, <![CDATA[One day ago]]> ), for 5,750 lines ( Case B)

                  • 161 single-line ranges <phrase> ......</phrase>, with other attributes, inside <![CDATA[........]]>, ( as, for instance, <![CDATA[<a href="{board_url}">Visit {board_title}</a>]]> ), for 161 lines ( Case C )

                  • 68 multi-line ranges <phrase ...</phase> for 423 lines ( Case D )


                  I’m afraid that the 161 lines ( Case C ) and the 68 multi-lines ( Case D ) should be translated manually. Luckily, regarding the 5,750 lines, of Case B, an automatic search/replacement could be considered !

                  Here is my method for processing this 5750 translations automatically :

                  Firstly, we extract all the text which should be translated :

                  • Copy your file, in a new tab

                  • Open the Mark dialog ( Search > Mark... )

                  • Type the search regex (?-s)^\x20\x20<phrase .+?<![CDATA[\K[^]\r\n<[]+(?=]]></phrase>)

                  • Tick the Bookmark line and the Purge for each search options

                  • Click on the Mark all button

                  • Remove All the unmarked lines with the menu command Search > Bookmark > Remove Unmarked Lines

                  • Click on the Clear all marks button of the Mark dialog

                  • Now, open the Replace dialog

                  • Type in the SEARCH regex (?-s)^\x20\x20<phrase .+?<![CDATA[|]]></phrase>

                  • Leave the REPLACE zone EMPTY

                  • Click on the Replace all button

                  => You get your exact 5750 text lines, which can be translated

                  Secondly, we change any line with the same line, written two times, separated with the Black Square character ( \x{25A0} = ■ ) which allow you to easily see the part which needs translating !

                  • SEARCH (?-s).+

                  • REPLACE $0\x{25A0}$0

                  • Tick the Wrap around option and the Regular expression search mode

                  • Click on the Replace All button

                  Thirdly, and it’s the very big task : either, manually or with the help of a translator on Net, build the analog list, changing the 5,750 sentences, located after the ■ sign, in your Czech language !

                  Fourthly, we add the original XML contents and change all the zones, needing translation, with its appropriate translation :

                  • Move back to the very beginning of these sentences

                  • Paste your original XML file contents, before the present first line

                  • Right after, add a separator line, with, at least, 3 equal signs, ===

                  • Open the Replace dialog

                  • SEARCH (?-s)^(\x20\x20<phrase .+?<![CDATA[)([^]\r\n<[]+)(]]></phrase>)(?=(?s).+^=.*?\R(?-s)\2\x{25A0}(.+))|(?s)^===+.+

                  • REPLACE \1\4\3

                  • Tick the Wrap around option and the Regular expression search mode

                  • Click on the Replace All button

                  => After some time…, any single-line <phrase.........</phrase> should contain the translated sentences, inside the <![CDATA[......]]> areas.

                  Important : When doing the test with your real data, it took 9 minutes about, on my old XP laptop, to process the 5,750 lines :-((. But I suppose that this S/R can be executed in less than 3 mn, on modern laptops !!


                  To give you a general idea, just consider the beginning of your XML file, with 12 single lines <phrase.......</phrase> and 1 multi-lines block <phrase.......</phrase>, followed with the 12 couples of sentences English/Czech, after the separator line of equal signs

                  <?xml version="1.0" encoding="utf-8"?>
                  <language title="Čeština" date_format="d.m.Y" time_format="H:i" currency_format="{value} {symbol}" week_start="0" decimal_point="." thousands_separator="," label_separator=":" comma_separator=", " ellipsis="..." parenthesis_open="(" parenthesis_close=")" language_code="cs-CZ" text_direction="LTR" export_version="2">
                    <phrase title="1_day_ago" addon_id="XF" version_id="1010010" version_string="1.1.0 Alpha"><![CDATA[One day ago]]></phrase>
                    <phrase title="1_month_ago" addon_id="XF" version_id="1010010" version_string="1.1.0 Alpha"><![CDATA[One month ago]]></phrase>
                    <phrase title="1_time" addon_id="XF" version_id="2000010" version_string="2.0.0 Alpha"><![CDATA[1 time]]></phrase>
                    <phrase title="1_week_ago" addon_id="XF" version_id="1010010" version_string="1.1.0 Alpha"><![CDATA[One week ago]]></phrase>
                    <phrase title="1_year" addon_id="XF" version_id="2000010" version_string="2.0.0 Alpha"><![CDATA[1 year]]></phrase>
                    <phrase title="1_year_ago" addon_id="XF" version_id="1010010" version_string="1.1.0 Alpha"><![CDATA[One year ago]]></phrase>
                    <phrase title="2_weeks_ago" addon_id="XF" version_id="1010010" version_string="1.1.0 Alpha"><![CDATA[Two weeks ago]]></phrase>
                    <phrase title="2_years_ago" addon_id="XF" version_id="1010010" version_string="1.1.0 Alpha"><![CDATA[Two years ago]]></phrase>
                    <phrase title="2x_image_replacement_url" addon_id="XF" version_id="2000010" version_string="2.0.0 Alpha"><![CDATA[2x image replacement URL]]></phrase>
                    <phrase title="2x_image_replacement_url_explain" addon_id="XF" version_id="2000010" version_string="2.0.0 Alpha"><![CDATA[If provided, the 2x image will be automatically displayed instead of the image URL above on devices capable of displaying a higher pixel resolution.<br />
                  <br />
                  <strong>Note: This option has no effect with sprite mode enabled.</strong>]]></phrase>
                    <phrase title="3_months_ago" addon_id="XF" version_id="1010010" version_string="1.1.0 Alpha"><![CDATA[Three months ago]]></phrase>
                    <phrase title="6_months_ago" addon_id="XF" version_id="1010010" version_string="1.1.0 Alpha"><![CDATA[Six months ago]]></phrase>
                    <phrase title="9_months_ago" addon_id="XF" version_id="1010010" version_string="1.1.0 Alpha"><![CDATA[Nine months ago]]></phrase>
                  =============
                  One day ago■Před jedním dnem
                  One month ago■Před měcísem
                  1 time■1 krát
                  One week ago■Před týdnem
                  1 year■1 rok
                  One year ago■Před rokem
                  Two weeks ago■Před dvěma týdny
                  Two years ago■Před dvěma lety
                  2x image replacement URL■2x adresa URL pro výměnu obrázku
                  Three months ago■Před trěmi měcísi
                  Six months ago■Před šesti měcísi
                  Nine months ago■Před devíti měcísi
                  
                  • Open the Replace dialog

                  • SEARCH (?-s)^(\x20\x20<phrase .+?<![CDATA[)([^]\r\n<[]+)(]]></phrase>)(?=(?s).+^=.*?\R(?-s)\2\x{25A0}(.+))|(?s)^===+.+

                  • REPLACE \1\4\3

                  • Tick the Wrap around option and the Regular expression search mode

                  • Click on the Replace All button

                  You should be left with the expected text :

                  <?xml version="1.0" encoding="utf-8"?>
                  <language title="Čeština" date_format="d.m.Y" time_format="H:i" currency_format="{value} {symbol}" week_start="0" decimal_point="." thousands_separator="," label_separator=":" comma_separator=", " ellipsis="..." parenthesis_open="(" parenthesis_close=")" language_code="cs-CZ" text_direction="LTR" export_version="2">
                    <phrase title="1_day_ago" addon_id="XF" version_id="1010010" version_string="1.1.0 Alpha"><![CDATA[Před jedním dnem]]></phrase>
                    <phrase title="1_month_ago" addon_id="XF" version_id="1010010" version_string="1.1.0 Alpha"><![CDATA[Před měcísem]]></phrase>
                    <phrase title="1_time" addon_id="XF" version_id="2000010" version_string="2.0.0 Alpha"><![CDATA[1 krát]]></phrase>
                    <phrase title="1_week_ago" addon_id="XF" version_id="1010010" version_string="1.1.0 Alpha"><![CDATA[Před týdnem]]></phrase>
                    <phrase title="1_year" addon_id="XF" version_id="2000010" version_string="2.0.0 Alpha"><![CDATA[1 rok]]></phrase>
                    <phrase title="1_year_ago" addon_id="XF" version_id="1010010" version_string="1.1.0 Alpha"><![CDATA[Před rokem]]></phrase>
                    <phrase title="2_weeks_ago" addon_id="XF" version_id="1010010" version_string="1.1.0 Alpha"><![CDATA[Před dvěma týdny]]></phrase>
                    <phrase title="2_years_ago" addon_id="XF" version_id="1010010" version_string="1.1.0 Alpha"><![CDATA[Před dvěma lety]]></phrase>
                    <phrase title="2x_image_replacement_url" addon_id="XF" version_id="2000010" version_string="2.0.0 Alpha"><![CDATA[2x adresa URL pro výměnu obrázku]]></phrase>
                    <phrase title="2x_image_replacement_url_explain" addon_id="XF" version_id="2000010" version_string="2.0.0 Alpha"><![CDATA[If provided, the 2x image will be automatically displayed instead of the image URL above on devices capable of displaying a higher pixel resolution.<br />
                  <br />
                  <strong>Note: This option has no effect with sprite mode enabled.</strong>]]></phrase>
                    <phrase title="3_months_ago" addon_id="XF" version_id="1010010" version_string="1.1.0 Alpha"><![CDATA[Před trěmi měcísi]]></phrase>
                    <phrase title="6_months_ago" addon_id="XF" version_id="1010010" version_string="1.1.0 Alpha"><![CDATA[Před šesti měcísi]]></phrase>
                    <phrase title="9_months_ago" addon_id="XF" version_id="1010010" version_string="1.1.0 Alpha"><![CDATA[Před devíti měcísi]]></phrase>
                  

                  Just remark, in this small example, that the multi-line block <phrase.......</phrase>, containing the text " If provided, the 2x image…sprite mode enabled. ", remains unchanged ( only single-lines are changed ! )

                  See you later,

                  Best Regards,

                  guy038

                  P.S. :

                  If necessary, you can send me some files by e-mail at :

                  Meta Chuh 1 Reply Last reply Reply Quote 2
                  • Meta Chuh
                    Meta Chuh @guy038 last edited by

                    @guy038

                    off topic:
                    welcome back. 👍
                    almost a week without you is very close to the border of worrying enough to start a search party ;-) 😉

                    sincere and best regards
                    metachuh

                    1 Reply Last reply Reply Quote 1
                    • First post
                      Last post
                    Copyright © 2014 NodeBB Forums | Contributors