Community
    • Login

    find two letters between quotes in lang tag

    Scheduled Pinned Locked Moved General Discussion
    14 Posts 4 Posters 3.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Alan KilbornA
      Alan Kilborn @Pouemes44
      last edited by Alan Kilborn

      @Pouemes44 said in find two letters between quotes in lang tag:

      Alan it took tag with 3 letters

      Hmmm. Really not sure how you could get that result, reference:

      6eca29b9-89b8-4dd7-9e80-8ce12d53eb5f-image.png

      Even though you didn’t have luck with mine, here’s how I’d change mine to exclude matching fr :

      (?-i)lang="(?!fr)[[:alpha:]]{2}"

      1 Reply Last reply Reply Quote 2
      • Pouemes44P
        Pouemes44
        last edited by

        thanks Alan yes work like this
        perhaps firt time i forget the last quote, that why it was not correct
        a great thanks

        1 Reply Last reply Reply Quote 1
        • astrosofistaA
          astrosofista @Pouemes44
          last edited by

          @Pouemes44 said in find two letters between quotes in lang tag:

          how to exclude from the search lang=“fr”

          Try this instead:

          (?-i)(?<=\x20lang=")(?!fr)\l{2}(?=">)
          

          Take care and have fun!

          1 Reply Last reply Reply Quote 0
          • guy038G
            guy038
            last edited by guy038

            Hi, @pouemes44, @alan-kilborn, @astrosofista and All,

            Alan, Hum…, interesting ! I assume, since my last post, that @pouemes44 was talking about the HTML lang attribute. So, I dug out a bit on Net !

            And, from these links :

            • https://www.w3schools.com/tags/ref_language_codes.asp

            • https://www.w3schools.com/tags/ref_country_codes.asp

            • https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes

            • https://en.wikipedia.org/wiki/List_of_ISO_639-2_codes

            • https://www.w3.org/International/articles/language-tags/

            • http://www.lingoes.net/en/translator/langcode.htm

            We can deduce that :

            • Language codes have always 2 or 3 lowercase characters ( Refer ISO 639-2 from Wikipedia )

            • Country codes have always 2 uppercase characters, ( Refer ISO 3166-1  alpha-2 code list, from Wikipedia )

            • A Language code stands by itself OR may be followed with a dash - and a country code OR a script code ( Refer ISO Language Code Table )

            • Generally, Language tags are lowercase, alphabetic region subtags are uppercase, and script tags begin with an initial capital ( Refer https://www.w3.org/International/articles/language-tags/#rfc )


            So from the main list, below ( Refer http://www.lingoes.net/en/translator/langcode.htm ), with 241 items :

            af          Afrikaans
            af-ZA       Afrikaans (South Africa)
            ar          Arabic
            ar-AE       Arabic (U.A.E.)
            ar-BH       Arabic (Bahrain)
            ar-DZ       Arabic (Algeria)
            ar-EG       Arabic (Egypt)
            ar-IQ       Arabic (Iraq)
            ar-JO       Arabic (Jordan)
            ar-KW       Arabic (Kuwait)
            ar-LB       Arabic (Lebanon)
            ar-LY       Arabic (Libya)
            ar-MA       Arabic (Morocco)
            ar-OM       Arabic (Oman)
            ar-QA       Arabic (Qatar)
            ar-SA       Arabic (Saudi Arabia)
            ar-SY       Arabic (Syria)
            ar-TN       Arabic (Tunisia)
            ar-YE       Arabic (Yemen)
            az          Azeri (Latin)
            az-AZ       Azeri (Latin) (Azerbaijan)
            az-AZ       Azeri (Cyrillic) (Azerbaijan)
            be          Belarusian
            be-BY       Belarusian (Belarus)
            bg          Bulgarian
            bg-BG       Bulgarian (Bulgaria)
            bs-BA       Bosnian (Bosnia and Herzegovina)
            ca          Catalan
            ca-ES       Catalan (Spain)
            cs          Czech
            cs-CZ       Czech (Czech Republic)
            cy          Welsh
            cy-GB       Welsh (United Kingdom)
            da          Danish
            da-DK       Danish (Denmark)
            de          German
            de-AT       German (Austria)
            de-CH       German (Switzerland)
            de-DE       German (Germany)
            de-LI       German (Liechtenstein)
            de-LU       German (Luxembourg)
            dv          Divehi
            dv-MV       Divehi (Maldives)
            el          Greek
            el-GR       Greek (Greece)
            en          English
            en-AU       English (Australia)
            en-BZ       English (Belize)
            en-CA       English (Canada)
            en-CB       English (Caribbean)
            en-GB       English (United Kingdom)
            en-IE       English (Ireland)
            en-JM       English (Jamaica)
            en-NZ       English (New Zealand)
            en-PH       English (Republic of the Philippines)
            en-TT       English (Trinidad and Tobago)
            en-US       English (United States)
            en-ZA       English (South Africa)
            en-ZW       English (Zimbabwe)
            eo          Esperanto
            es          Spanish
            es-AR       Spanish (Argentina)
            es-BO       Spanish (Bolivia)
            es-CL       Spanish (Chile)
            es-CO       Spanish (Colombia)
            es-CR       Spanish (Costa Rica)
            es-DO       Spanish (Dominican Republic)
            es-EC       Spanish (Ecuador)
            es-ES       Spanish (Castilian)
            es-ES       Spanish (Spain)
            es-GT       Spanish (Guatemala)
            es-HN       Spanish (Honduras)
            es-MX       Spanish (Mexico)
            es-NI       Spanish (Nicaragua)
            es-PA       Spanish (Panama)
            es-PE       Spanish (Peru)
            es-PR       Spanish (Puerto Rico)
            es-PY       Spanish (Paraguay)
            es-SV       Spanish (El Salvador)
            es-UY       Spanish (Uruguay)
            es-VE       Spanish (Venezuela)
            et          Estonian
            et-EE       Estonian (Estonia)
            eu          Basque
            eu-ES       Basque (Spain)
            fa          Farsi
            fa-IR       Farsi (Iran)
            fi          Finnish
            fi-FI       Finnish (Finland)
            fo          Faroese
            fo-FO       Faroese (Faroe Islands)
            fr          French
            fr-BE       French (Belgium)
            fr-CA       French (Canada)
            fr-CH       French (Switzerland)
            fr-FR       French (France)
            fr-LU       French (Luxembourg)
            fr-MC       French (Principality of Monaco)
            gl          Galician
            gl-ES       Galician (Spain)
            gu          Gujarati
            gu-IN       Gujarati (India)
            he          Hebrew
            he-IL       Hebrew (Israel)
            hi          Hindi
            hi-IN       Hindi (India)
            hr          Croatian
            hr-BA       Croatian (Bosnia and Herzegovina)
            hr-HR       Croatian (Croatia)
            hu          Hungarian
            hu-HU       Hungarian (Hungary)
            hy          Armenian
            hy-AM       Armenian (Armenia)
            id          Indonesian
            id-ID       Indonesian (Indonesia)
            is          Icelandic
            is-IS       Icelandic (Iceland)
            it          Italian
            it-CH       Italian (Switzerland)
            it-IT       Italian (Italy)
            ja          Japanese
            ja-JP       Japanese (Japan)
            ka          Georgian
            ka-GE       Georgian (Georgia)
            kk          Kazakh
            kk-KZ       Kazakh (Kazakhstan)
            kn          Kannada
            kn-IN       Kannada (India)
            ko          Korean
            ko-KR       Korean (Korea)
            kok         Konkani
            kok-IN      Konkani (India)
            ky          Kyrgyz
            ky-KG       Kyrgyz (Kyrgyzstan)
            lt          Lithuanian
            lt-LT       Lithuanian (Lithuania)
            lv          Latvian
            lv-LV       Latvian (Latvia)
            mi          Maori
            mi-NZ       Maori (New Zealand)
            mk          FYRO Macedonian
            mk-MK       FYRO Macedonian (Former Yugoslav Republic of Macedonia)
            mn          Mongolian
            mn-MN       Mongolian (Mongolia)
            mr          Marathi
            mr-IN       Marathi (India)
            ms          Malay
            ms-BN       Malay (Brunei Darussalam)
            ms-MY       Malay (Malaysia)
            mt          Maltese
            mt-MT       Maltese (Malta)
            nb          Norwegian (Bokm?l)
            nb-NO       Norwegian (Bokm?l) (Norway)
            nl          Dutch
            nl-BE       Dutch (Belgium)
            nl-NL       Dutch (Netherlands)
            nn-NO       Norwegian (Nynorsk) (Norway)
            ns          Northern Sotho
            ns-ZA       Northern Sotho (South Africa)
            pa          Punjabi
            pa-IN       Punjabi (India)
            pl          Polish
            pl-PL       Polish (Poland)
            ps          Pashto
            ps-AR       Pashto (Afghanistan)
            pt          Portuguese
            pt-BR       Portuguese (Brazil)
            pt-PT       Portuguese (Portugal)
            qu          Quechua
            qu-BO       Quechua (Bolivia)
            qu-EC       Quechua (Ecuador)
            qu-PE       Quechua (Peru)
            ro          Romanian
            ro-RO       Romanian (Romania)
            ru          Russian
            ru-RU       Russian (Russia)
            sa          Sanskrit
            sa-IN       Sanskrit (India)
            se          Sami (Northern)
            se-FI       Sami (Northern) (Finland)
            se-FI       Sami (Skolt) (Finland)
            se-FI       Sami (Inari) (Finland)
            se-NO       Sami (Northern) (Norway)
            se-NO       Sami (Lule) (Norway)
            se-NO       Sami (Southern) (Norway)
            se-SE       Sami (Northern) (Sweden)
            se-SE       Sami (Lule) (Sweden)
            se-SE       Sami (Southern) (Sweden)
            sk          Slovak
            sk-SK       Slovak (Slovakia)
            sl          Slovenian
            sl-SI       Slovenian (Slovenia)
            sq          Albanian
            sq-AL       Albanian (Albania)
            sr-BA       Serbian (Latin) (Bosnia and Herzegovina)
            sr-BA       Serbian (Cyrillic) (Bosnia and Herzegovina)
            sr-SP       Serbian (Latin) (Serbia and Montenegro)
            sr-SP       Serbian (Cyrillic) (Serbia and Montenegro)
            sv          Swedish
            sv-FI       Swedish (Finland)
            sv-SE       Swedish (Sweden)
            sw          Swahili
            sw-KE       Swahili (Kenya)
            syr         Syriac
            syr-SY      Syriac (Syria)
            ta          Tamil
            ta-IN       Tamil (India)
            te          Telugu
            te-IN       Telugu (India)
            th          Thai
            th-TH       Thai (Thailand)
            tl          Tagalog
            tl-PH       Tagalog (Philippines)
            tn          Tswana
            tn-ZA       Tswana (South Africa)
            tr          Turkish
            tr-TR       Turkish (Turkey)
            tt          Tatar
            tt-RU       Tatar (Russia)
            ts          Tsonga
            uk          Ukrainian
            uk-UA       Ukrainian (Ukraine)
            ur          Urdu
            ur-PK       Urdu (Islamic Republic of Pakistan)
            uz          Uzbek (Latin)
            uz-UZ       Uzbek (Latin) (Uzbekistan)
            uz-UZ       Uzbek (Cyrillic) (Uzbekistan)
            vi          Vietnamese
            vi-VN       Vietnamese (Viet Nam)
            xh          Xhosa
            xh-ZA       Xhosa (South Africa)
            zh          Chinese
            zh-CN       Chinese (State)
            zh-Hans     Chinese (Simplified Han Script)
            zh-Hant     Chinese (Traditional Han Script)
            zh-HK       Chinese (Hong Kong)
            zh-MO       Chinese (Macau)
            zh-SG       Chinese (Singapore)
            zh-TW       Chinese (Taiwan)
            zu          Zulu
            zu-ZA       Zulu (South Africa)
            

            This new regex version matches all the possible language codes :

            SEARCH / MARK (?-i)(?<=\x20lang=")(?:zh\-Han(s|t)|\l{2,3}(-\u{2})?)(?=">?)

            Now, in order to omit the two "fr" and "fr-FR" languages, only, prefer the regex, below :

            SEARCH / MARK (?-i)(?<=\x20lang=")(?:zh\-Han(s|t)|(?!fr(-FR)?">?)\l{2,3}(-\u{2})?)(?=">?)

            You may test these two regexes against the list above !

            Best Regards,

            guy038

            Alan KilbornA 1 Reply Last reply Reply Quote 1
            • Pouemes44P
              Pouemes44
              last edited by

              thanks to all
              yes i am trying to find in my pages the iso with two letters which refer generally to ISO 693-1

              because i think i havesome mistakes

              example lingala
              ISO 639-1 ln
              ISO 639-2 lin
              ISO 639-3 lin
              IETF ln

              not easy to know which language code

              http://www.language-archives.org/language/lin
              and
              https://www.ethnologue.com/language/lin

              use ISO 693-3

              1 Reply Last reply Reply Quote 0
              • Alan KilbornA
                Alan Kilborn @guy038
                last edited by

                @guy038

                Well… You can read into a poster’s request as much as you want, and go off and research a poster’s problem, again, as much as you want. :-)
                I’m sure there might be some interesting “finds” along such a journey.

                I don’t mind helping with regex requests (except from the “takers”), but I’m sticking to what is asked for, and I’m not going to infer a bunch of stuff. My goal is “get them on their way” quickly. Just my take on it.

                Also, if we solve the problem they ask for, and it isn’t the problem they have, perhaps they learn to be better askers?

                But, I didn’t exactly solve the problem that was asked for: “find all lines…”. Really then the hit should have consisted of a full line, right? Well, we have some wiggle room here, as a “Find All…” search provides the whole line data requested.

                1 Reply Last reply Reply Quote 0
                • guy038G
                  guy038
                  last edited by

                  Hi, @pouemes44, @alan-kilborn, @astrosofista and All,

                  @pouemes44, the last regex, of my previous post, finds any language code, of 2 or 3 lowercase letters, optionnally followed with a dash and a country code of two uppercase letters, different from, either, "fr" and "fr-FR" OR finds the specific zh-Hans and zh-Hant Chinese syntaxes


                  Now, if we assume, as a matter of principle, that the language codes are all correct, in your files, the search of these language codes are more simple ! Indeed, as no control about syntax is needed, this regex, below, should be enough ( The two language codes "fr" and "fr-FR" are not taken in account ! )

                  SEARCH / MARK (?-is)(?<=\x20lang=")(?!fr"|fr-FR").+?(?=">?)


                  And, as @alan-kilborn said, if you prefer to highlight the entire lines, with their EOL chars, containing a language code, use that regex :

                  MARK (?-is)^.*\x20lang="(?!fr"|fr-FR").+\R?

                  Which looks for entire lines, EOL included, containing, at least, a space char, followed by a string lang", with this case, and followed with a valid language code, different from, either, "fr" and "fr-FR"


                  Finally, if you just need to bookmark the lines containing a global HTML attribute lang"..........", containing a valid language code, different from, either, "fr" and "fr-FR", use this final regex :

                  MARK (?-i)\x20lang="(?!fr"|fr-FR")

                  BR

                  guy038

                  1 Reply Last reply Reply Quote 0
                  • Pouemes44P
                    Pouemes44
                    last edited by

                    Great thanks Guy for all you explanations which are precious, and could be precious to next search
                    Thanks Alan too

                    ***Is there someone here who could be able to make a working extention like “toolbucket” able to search and replace in folders… it should be a great extention

                    Alan KilbornA 1 Reply Last reply Reply Quote 0
                    • Alan KilbornA
                      Alan Kilborn @Pouemes44
                      last edited by

                      @Pouemes44 said in find two letters between quotes in lang tag:

                      Is there someone here who could be able to make a working extention like “toolbucket” able to search and replace in folders… it should be a great extention

                      Have you tried Replace in Files on the Find in Files tab of the Find window?

                      1 Reply Last reply Reply Quote 0
                      • Pouemes44P
                        Pouemes44
                        last edited by Pouemes44

                        Hello Alan yes of course
                        but when i must search and replace multi lines, its not very easy with a sow little windows and must always use regular expresions for lines break, so with and extension it will be super

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors