Community
    • Login

    find two letters between quotes in lang tag

    Scheduled Pinned Locked Moved General Discussion
    14 Posts 4 Posters 3.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Pouemes44P
      Pouemes44
      last edited by Pouemes44

      thanks to you

      guy38 it seems to work perfectly but i i have lang=“fr” in all my page
      how to exclude from the search lang=“fr”
      Alan it took tag with 3 letters

      Alan KilbornA astrosofistaA 2 Replies Last reply Reply Quote 0
      • Alan KilbornA
        Alan Kilborn @Pouemes44
        last edited by Alan Kilborn

        @Pouemes44 said in find two letters between quotes in lang tag:

        Alan it took tag with 3 letters

        Hmmm. Really not sure how you could get that result, reference:

        6eca29b9-89b8-4dd7-9e80-8ce12d53eb5f-image.png

        Even though you didn’t have luck with mine, here’s how I’d change mine to exclude matching fr :

        (?-i)lang="(?!fr)[[:alpha:]]{2}"

        1 Reply Last reply Reply Quote 2
        • Pouemes44P
          Pouemes44
          last edited by

          thanks Alan yes work like this
          perhaps firt time i forget the last quote, that why it was not correct
          a great thanks

          1 Reply Last reply Reply Quote 1
          • astrosofistaA
            astrosofista @Pouemes44
            last edited by

            @Pouemes44 said in find two letters between quotes in lang tag:

            how to exclude from the search lang=“fr”

            Try this instead:

            (?-i)(?<=\x20lang=")(?!fr)\l{2}(?=">)
            

            Take care and have fun!

            1 Reply Last reply Reply Quote 0
            • guy038G
              guy038
              last edited by guy038

              Hi, @pouemes44, @alan-kilborn, @astrosofista and All,

              Alan, Hum…, interesting ! I assume, since my last post, that @pouemes44 was talking about the HTML lang attribute. So, I dug out a bit on Net !

              And, from these links :

              • https://www.w3schools.com/tags/ref_language_codes.asp

              • https://www.w3schools.com/tags/ref_country_codes.asp

              • https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes

              • https://en.wikipedia.org/wiki/List_of_ISO_639-2_codes

              • https://www.w3.org/International/articles/language-tags/

              • http://www.lingoes.net/en/translator/langcode.htm

              We can deduce that :

              • Language codes have always 2 or 3 lowercase characters ( Refer ISO 639-2 from Wikipedia )

              • Country codes have always 2 uppercase characters, ( Refer ISO 3166-1  alpha-2 code list, from Wikipedia )

              • A Language code stands by itself OR may be followed with a dash - and a country code OR a script code ( Refer ISO Language Code Table )

              • Generally, Language tags are lowercase, alphabetic region subtags are uppercase, and script tags begin with an initial capital ( Refer https://www.w3.org/International/articles/language-tags/#rfc )


              So from the main list, below ( Refer http://www.lingoes.net/en/translator/langcode.htm ), with 241 items :

              af          Afrikaans
              af-ZA       Afrikaans (South Africa)
              ar          Arabic
              ar-AE       Arabic (U.A.E.)
              ar-BH       Arabic (Bahrain)
              ar-DZ       Arabic (Algeria)
              ar-EG       Arabic (Egypt)
              ar-IQ       Arabic (Iraq)
              ar-JO       Arabic (Jordan)
              ar-KW       Arabic (Kuwait)
              ar-LB       Arabic (Lebanon)
              ar-LY       Arabic (Libya)
              ar-MA       Arabic (Morocco)
              ar-OM       Arabic (Oman)
              ar-QA       Arabic (Qatar)
              ar-SA       Arabic (Saudi Arabia)
              ar-SY       Arabic (Syria)
              ar-TN       Arabic (Tunisia)
              ar-YE       Arabic (Yemen)
              az          Azeri (Latin)
              az-AZ       Azeri (Latin) (Azerbaijan)
              az-AZ       Azeri (Cyrillic) (Azerbaijan)
              be          Belarusian
              be-BY       Belarusian (Belarus)
              bg          Bulgarian
              bg-BG       Bulgarian (Bulgaria)
              bs-BA       Bosnian (Bosnia and Herzegovina)
              ca          Catalan
              ca-ES       Catalan (Spain)
              cs          Czech
              cs-CZ       Czech (Czech Republic)
              cy          Welsh
              cy-GB       Welsh (United Kingdom)
              da          Danish
              da-DK       Danish (Denmark)
              de          German
              de-AT       German (Austria)
              de-CH       German (Switzerland)
              de-DE       German (Germany)
              de-LI       German (Liechtenstein)
              de-LU       German (Luxembourg)
              dv          Divehi
              dv-MV       Divehi (Maldives)
              el          Greek
              el-GR       Greek (Greece)
              en          English
              en-AU       English (Australia)
              en-BZ       English (Belize)
              en-CA       English (Canada)
              en-CB       English (Caribbean)
              en-GB       English (United Kingdom)
              en-IE       English (Ireland)
              en-JM       English (Jamaica)
              en-NZ       English (New Zealand)
              en-PH       English (Republic of the Philippines)
              en-TT       English (Trinidad and Tobago)
              en-US       English (United States)
              en-ZA       English (South Africa)
              en-ZW       English (Zimbabwe)
              eo          Esperanto
              es          Spanish
              es-AR       Spanish (Argentina)
              es-BO       Spanish (Bolivia)
              es-CL       Spanish (Chile)
              es-CO       Spanish (Colombia)
              es-CR       Spanish (Costa Rica)
              es-DO       Spanish (Dominican Republic)
              es-EC       Spanish (Ecuador)
              es-ES       Spanish (Castilian)
              es-ES       Spanish (Spain)
              es-GT       Spanish (Guatemala)
              es-HN       Spanish (Honduras)
              es-MX       Spanish (Mexico)
              es-NI       Spanish (Nicaragua)
              es-PA       Spanish (Panama)
              es-PE       Spanish (Peru)
              es-PR       Spanish (Puerto Rico)
              es-PY       Spanish (Paraguay)
              es-SV       Spanish (El Salvador)
              es-UY       Spanish (Uruguay)
              es-VE       Spanish (Venezuela)
              et          Estonian
              et-EE       Estonian (Estonia)
              eu          Basque
              eu-ES       Basque (Spain)
              fa          Farsi
              fa-IR       Farsi (Iran)
              fi          Finnish
              fi-FI       Finnish (Finland)
              fo          Faroese
              fo-FO       Faroese (Faroe Islands)
              fr          French
              fr-BE       French (Belgium)
              fr-CA       French (Canada)
              fr-CH       French (Switzerland)
              fr-FR       French (France)
              fr-LU       French (Luxembourg)
              fr-MC       French (Principality of Monaco)
              gl          Galician
              gl-ES       Galician (Spain)
              gu          Gujarati
              gu-IN       Gujarati (India)
              he          Hebrew
              he-IL       Hebrew (Israel)
              hi          Hindi
              hi-IN       Hindi (India)
              hr          Croatian
              hr-BA       Croatian (Bosnia and Herzegovina)
              hr-HR       Croatian (Croatia)
              hu          Hungarian
              hu-HU       Hungarian (Hungary)
              hy          Armenian
              hy-AM       Armenian (Armenia)
              id          Indonesian
              id-ID       Indonesian (Indonesia)
              is          Icelandic
              is-IS       Icelandic (Iceland)
              it          Italian
              it-CH       Italian (Switzerland)
              it-IT       Italian (Italy)
              ja          Japanese
              ja-JP       Japanese (Japan)
              ka          Georgian
              ka-GE       Georgian (Georgia)
              kk          Kazakh
              kk-KZ       Kazakh (Kazakhstan)
              kn          Kannada
              kn-IN       Kannada (India)
              ko          Korean
              ko-KR       Korean (Korea)
              kok         Konkani
              kok-IN      Konkani (India)
              ky          Kyrgyz
              ky-KG       Kyrgyz (Kyrgyzstan)
              lt          Lithuanian
              lt-LT       Lithuanian (Lithuania)
              lv          Latvian
              lv-LV       Latvian (Latvia)
              mi          Maori
              mi-NZ       Maori (New Zealand)
              mk          FYRO Macedonian
              mk-MK       FYRO Macedonian (Former Yugoslav Republic of Macedonia)
              mn          Mongolian
              mn-MN       Mongolian (Mongolia)
              mr          Marathi
              mr-IN       Marathi (India)
              ms          Malay
              ms-BN       Malay (Brunei Darussalam)
              ms-MY       Malay (Malaysia)
              mt          Maltese
              mt-MT       Maltese (Malta)
              nb          Norwegian (Bokm?l)
              nb-NO       Norwegian (Bokm?l) (Norway)
              nl          Dutch
              nl-BE       Dutch (Belgium)
              nl-NL       Dutch (Netherlands)
              nn-NO       Norwegian (Nynorsk) (Norway)
              ns          Northern Sotho
              ns-ZA       Northern Sotho (South Africa)
              pa          Punjabi
              pa-IN       Punjabi (India)
              pl          Polish
              pl-PL       Polish (Poland)
              ps          Pashto
              ps-AR       Pashto (Afghanistan)
              pt          Portuguese
              pt-BR       Portuguese (Brazil)
              pt-PT       Portuguese (Portugal)
              qu          Quechua
              qu-BO       Quechua (Bolivia)
              qu-EC       Quechua (Ecuador)
              qu-PE       Quechua (Peru)
              ro          Romanian
              ro-RO       Romanian (Romania)
              ru          Russian
              ru-RU       Russian (Russia)
              sa          Sanskrit
              sa-IN       Sanskrit (India)
              se          Sami (Northern)
              se-FI       Sami (Northern) (Finland)
              se-FI       Sami (Skolt) (Finland)
              se-FI       Sami (Inari) (Finland)
              se-NO       Sami (Northern) (Norway)
              se-NO       Sami (Lule) (Norway)
              se-NO       Sami (Southern) (Norway)
              se-SE       Sami (Northern) (Sweden)
              se-SE       Sami (Lule) (Sweden)
              se-SE       Sami (Southern) (Sweden)
              sk          Slovak
              sk-SK       Slovak (Slovakia)
              sl          Slovenian
              sl-SI       Slovenian (Slovenia)
              sq          Albanian
              sq-AL       Albanian (Albania)
              sr-BA       Serbian (Latin) (Bosnia and Herzegovina)
              sr-BA       Serbian (Cyrillic) (Bosnia and Herzegovina)
              sr-SP       Serbian (Latin) (Serbia and Montenegro)
              sr-SP       Serbian (Cyrillic) (Serbia and Montenegro)
              sv          Swedish
              sv-FI       Swedish (Finland)
              sv-SE       Swedish (Sweden)
              sw          Swahili
              sw-KE       Swahili (Kenya)
              syr         Syriac
              syr-SY      Syriac (Syria)
              ta          Tamil
              ta-IN       Tamil (India)
              te          Telugu
              te-IN       Telugu (India)
              th          Thai
              th-TH       Thai (Thailand)
              tl          Tagalog
              tl-PH       Tagalog (Philippines)
              tn          Tswana
              tn-ZA       Tswana (South Africa)
              tr          Turkish
              tr-TR       Turkish (Turkey)
              tt          Tatar
              tt-RU       Tatar (Russia)
              ts          Tsonga
              uk          Ukrainian
              uk-UA       Ukrainian (Ukraine)
              ur          Urdu
              ur-PK       Urdu (Islamic Republic of Pakistan)
              uz          Uzbek (Latin)
              uz-UZ       Uzbek (Latin) (Uzbekistan)
              uz-UZ       Uzbek (Cyrillic) (Uzbekistan)
              vi          Vietnamese
              vi-VN       Vietnamese (Viet Nam)
              xh          Xhosa
              xh-ZA       Xhosa (South Africa)
              zh          Chinese
              zh-CN       Chinese (State)
              zh-Hans     Chinese (Simplified Han Script)
              zh-Hant     Chinese (Traditional Han Script)
              zh-HK       Chinese (Hong Kong)
              zh-MO       Chinese (Macau)
              zh-SG       Chinese (Singapore)
              zh-TW       Chinese (Taiwan)
              zu          Zulu
              zu-ZA       Zulu (South Africa)
              

              This new regex version matches all the possible language codes :

              SEARCH / MARK (?-i)(?<=\x20lang=")(?:zh\-Han(s|t)|\l{2,3}(-\u{2})?)(?=">?)

              Now, in order to omit the two "fr" and "fr-FR" languages, only, prefer the regex, below :

              SEARCH / MARK (?-i)(?<=\x20lang=")(?:zh\-Han(s|t)|(?!fr(-FR)?">?)\l{2,3}(-\u{2})?)(?=">?)

              You may test these two regexes against the list above !

              Best Regards,

              guy038

              Alan KilbornA 1 Reply Last reply Reply Quote 1
              • Pouemes44P
                Pouemes44
                last edited by

                thanks to all
                yes i am trying to find in my pages the iso with two letters which refer generally to ISO 693-1

                because i think i havesome mistakes

                example lingala
                ISO 639-1 ln
                ISO 639-2 lin
                ISO 639-3 lin
                IETF ln

                not easy to know which language code

                http://www.language-archives.org/language/lin
                and
                https://www.ethnologue.com/language/lin

                use ISO 693-3

                1 Reply Last reply Reply Quote 0
                • Alan KilbornA
                  Alan Kilborn @guy038
                  last edited by

                  @guy038

                  Well… You can read into a poster’s request as much as you want, and go off and research a poster’s problem, again, as much as you want. :-)
                  I’m sure there might be some interesting “finds” along such a journey.

                  I don’t mind helping with regex requests (except from the “takers”), but I’m sticking to what is asked for, and I’m not going to infer a bunch of stuff. My goal is “get them on their way” quickly. Just my take on it.

                  Also, if we solve the problem they ask for, and it isn’t the problem they have, perhaps they learn to be better askers?

                  But, I didn’t exactly solve the problem that was asked for: “find all lines…”. Really then the hit should have consisted of a full line, right? Well, we have some wiggle room here, as a “Find All…” search provides the whole line data requested.

                  1 Reply Last reply Reply Quote 0
                  • guy038G
                    guy038
                    last edited by

                    Hi, @pouemes44, @alan-kilborn, @astrosofista and All,

                    @pouemes44, the last regex, of my previous post, finds any language code, of 2 or 3 lowercase letters, optionnally followed with a dash and a country code of two uppercase letters, different from, either, "fr" and "fr-FR" OR finds the specific zh-Hans and zh-Hant Chinese syntaxes


                    Now, if we assume, as a matter of principle, that the language codes are all correct, in your files, the search of these language codes are more simple ! Indeed, as no control about syntax is needed, this regex, below, should be enough ( The two language codes "fr" and "fr-FR" are not taken in account ! )

                    SEARCH / MARK (?-is)(?<=\x20lang=")(?!fr"|fr-FR").+?(?=">?)


                    And, as @alan-kilborn said, if you prefer to highlight the entire lines, with their EOL chars, containing a language code, use that regex :

                    MARK (?-is)^.*\x20lang="(?!fr"|fr-FR").+\R?

                    Which looks for entire lines, EOL included, containing, at least, a space char, followed by a string lang", with this case, and followed with a valid language code, different from, either, "fr" and "fr-FR"


                    Finally, if you just need to bookmark the lines containing a global HTML attribute lang"..........", containing a valid language code, different from, either, "fr" and "fr-FR", use this final regex :

                    MARK (?-i)\x20lang="(?!fr"|fr-FR")

                    BR

                    guy038

                    1 Reply Last reply Reply Quote 0
                    • Pouemes44P
                      Pouemes44
                      last edited by

                      Great thanks Guy for all you explanations which are precious, and could be precious to next search
                      Thanks Alan too

                      ***Is there someone here who could be able to make a working extention like “toolbucket” able to search and replace in folders… it should be a great extention

                      Alan KilbornA 1 Reply Last reply Reply Quote 0
                      • Alan KilbornA
                        Alan Kilborn @Pouemes44
                        last edited by

                        @Pouemes44 said in find two letters between quotes in lang tag:

                        Is there someone here who could be able to make a working extention like “toolbucket” able to search and replace in folders… it should be a great extention

                        Have you tried Replace in Files on the Find in Files tab of the Find window?

                        1 Reply Last reply Reply Quote 0
                        • Pouemes44P
                          Pouemes44
                          last edited by Pouemes44

                          Hello Alan yes of course
                          but when i must search and replace multi lines, its not very easy with a sow little windows and must always use regular expresions for lines break, so with and extension it will be super

                          1 Reply Last reply Reply Quote 0
                          • First post
                            Last post
                          The Community of users of the Notepad++ text editor.
                          Powered by NodeBB | Contributors