Community
    • Login

    find two letters between quotes in lang tag

    Scheduled Pinned Locked Moved General Discussion
    14 Posts 4 Posters 3.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Alan KilbornA
      Alan Kilborn @guy038
      last edited by

      @guy038 said in find two letters between quotes in lang tag:

      SEARCH / MARK (?-i)(?<=\x20lang=“)\l{2}(?=”>)

      It seems overly restrictive to me.
      OP mentions nothing about uppercase versus lowercase.
      OP mentions “tags” but how are we to know what this really means for them?
      Based upon OP’s specification, I would try:

      (?-i)lang="[[:alpha:]]{2}"

      Of course, still vague is the type of double-quotes we are talking about.

      1 Reply Last reply Reply Quote 0
      • Pouemes44P
        Pouemes44
        last edited by Pouemes44

        thanks to you

        guy38 it seems to work perfectly but i i have lang=“fr” in all my page
        how to exclude from the search lang=“fr”
        Alan it took tag with 3 letters

        Alan KilbornA astrosofistaA 2 Replies Last reply Reply Quote 0
        • Alan KilbornA
          Alan Kilborn @Pouemes44
          last edited by Alan Kilborn

          @Pouemes44 said in find two letters between quotes in lang tag:

          Alan it took tag with 3 letters

          Hmmm. Really not sure how you could get that result, reference:

          6eca29b9-89b8-4dd7-9e80-8ce12d53eb5f-image.png

          Even though you didn’t have luck with mine, here’s how I’d change mine to exclude matching fr :

          (?-i)lang="(?!fr)[[:alpha:]]{2}"

          1 Reply Last reply Reply Quote 2
          • Pouemes44P
            Pouemes44
            last edited by

            thanks Alan yes work like this
            perhaps firt time i forget the last quote, that why it was not correct
            a great thanks

            1 Reply Last reply Reply Quote 1
            • astrosofistaA
              astrosofista @Pouemes44
              last edited by

              @Pouemes44 said in find two letters between quotes in lang tag:

              how to exclude from the search lang=“fr”

              Try this instead:

              (?-i)(?<=\x20lang=")(?!fr)\l{2}(?=">)
              

              Take care and have fun!

              1 Reply Last reply Reply Quote 0
              • guy038G
                guy038
                last edited by guy038

                Hi, @pouemes44, @alan-kilborn, @astrosofista and All,

                Alan, Hum…, interesting ! I assume, since my last post, that @pouemes44 was talking about the HTML lang attribute. So, I dug out a bit on Net !

                And, from these links :

                • https://www.w3schools.com/tags/ref_language_codes.asp

                • https://www.w3schools.com/tags/ref_country_codes.asp

                • https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes

                • https://en.wikipedia.org/wiki/List_of_ISO_639-2_codes

                • https://www.w3.org/International/articles/language-tags/

                • http://www.lingoes.net/en/translator/langcode.htm

                We can deduce that :

                • Language codes have always 2 or 3 lowercase characters ( Refer ISO 639-2 from Wikipedia )

                • Country codes have always 2 uppercase characters, ( Refer ISO 3166-1  alpha-2 code list, from Wikipedia )

                • A Language code stands by itself OR may be followed with a dash - and a country code OR a script code ( Refer ISO Language Code Table )

                • Generally, Language tags are lowercase, alphabetic region subtags are uppercase, and script tags begin with an initial capital ( Refer https://www.w3.org/International/articles/language-tags/#rfc )


                So from the main list, below ( Refer http://www.lingoes.net/en/translator/langcode.htm ), with 241 items :

                af          Afrikaans
                af-ZA       Afrikaans (South Africa)
                ar          Arabic
                ar-AE       Arabic (U.A.E.)
                ar-BH       Arabic (Bahrain)
                ar-DZ       Arabic (Algeria)
                ar-EG       Arabic (Egypt)
                ar-IQ       Arabic (Iraq)
                ar-JO       Arabic (Jordan)
                ar-KW       Arabic (Kuwait)
                ar-LB       Arabic (Lebanon)
                ar-LY       Arabic (Libya)
                ar-MA       Arabic (Morocco)
                ar-OM       Arabic (Oman)
                ar-QA       Arabic (Qatar)
                ar-SA       Arabic (Saudi Arabia)
                ar-SY       Arabic (Syria)
                ar-TN       Arabic (Tunisia)
                ar-YE       Arabic (Yemen)
                az          Azeri (Latin)
                az-AZ       Azeri (Latin) (Azerbaijan)
                az-AZ       Azeri (Cyrillic) (Azerbaijan)
                be          Belarusian
                be-BY       Belarusian (Belarus)
                bg          Bulgarian
                bg-BG       Bulgarian (Bulgaria)
                bs-BA       Bosnian (Bosnia and Herzegovina)
                ca          Catalan
                ca-ES       Catalan (Spain)
                cs          Czech
                cs-CZ       Czech (Czech Republic)
                cy          Welsh
                cy-GB       Welsh (United Kingdom)
                da          Danish
                da-DK       Danish (Denmark)
                de          German
                de-AT       German (Austria)
                de-CH       German (Switzerland)
                de-DE       German (Germany)
                de-LI       German (Liechtenstein)
                de-LU       German (Luxembourg)
                dv          Divehi
                dv-MV       Divehi (Maldives)
                el          Greek
                el-GR       Greek (Greece)
                en          English
                en-AU       English (Australia)
                en-BZ       English (Belize)
                en-CA       English (Canada)
                en-CB       English (Caribbean)
                en-GB       English (United Kingdom)
                en-IE       English (Ireland)
                en-JM       English (Jamaica)
                en-NZ       English (New Zealand)
                en-PH       English (Republic of the Philippines)
                en-TT       English (Trinidad and Tobago)
                en-US       English (United States)
                en-ZA       English (South Africa)
                en-ZW       English (Zimbabwe)
                eo          Esperanto
                es          Spanish
                es-AR       Spanish (Argentina)
                es-BO       Spanish (Bolivia)
                es-CL       Spanish (Chile)
                es-CO       Spanish (Colombia)
                es-CR       Spanish (Costa Rica)
                es-DO       Spanish (Dominican Republic)
                es-EC       Spanish (Ecuador)
                es-ES       Spanish (Castilian)
                es-ES       Spanish (Spain)
                es-GT       Spanish (Guatemala)
                es-HN       Spanish (Honduras)
                es-MX       Spanish (Mexico)
                es-NI       Spanish (Nicaragua)
                es-PA       Spanish (Panama)
                es-PE       Spanish (Peru)
                es-PR       Spanish (Puerto Rico)
                es-PY       Spanish (Paraguay)
                es-SV       Spanish (El Salvador)
                es-UY       Spanish (Uruguay)
                es-VE       Spanish (Venezuela)
                et          Estonian
                et-EE       Estonian (Estonia)
                eu          Basque
                eu-ES       Basque (Spain)
                fa          Farsi
                fa-IR       Farsi (Iran)
                fi          Finnish
                fi-FI       Finnish (Finland)
                fo          Faroese
                fo-FO       Faroese (Faroe Islands)
                fr          French
                fr-BE       French (Belgium)
                fr-CA       French (Canada)
                fr-CH       French (Switzerland)
                fr-FR       French (France)
                fr-LU       French (Luxembourg)
                fr-MC       French (Principality of Monaco)
                gl          Galician
                gl-ES       Galician (Spain)
                gu          Gujarati
                gu-IN       Gujarati (India)
                he          Hebrew
                he-IL       Hebrew (Israel)
                hi          Hindi
                hi-IN       Hindi (India)
                hr          Croatian
                hr-BA       Croatian (Bosnia and Herzegovina)
                hr-HR       Croatian (Croatia)
                hu          Hungarian
                hu-HU       Hungarian (Hungary)
                hy          Armenian
                hy-AM       Armenian (Armenia)
                id          Indonesian
                id-ID       Indonesian (Indonesia)
                is          Icelandic
                is-IS       Icelandic (Iceland)
                it          Italian
                it-CH       Italian (Switzerland)
                it-IT       Italian (Italy)
                ja          Japanese
                ja-JP       Japanese (Japan)
                ka          Georgian
                ka-GE       Georgian (Georgia)
                kk          Kazakh
                kk-KZ       Kazakh (Kazakhstan)
                kn          Kannada
                kn-IN       Kannada (India)
                ko          Korean
                ko-KR       Korean (Korea)
                kok         Konkani
                kok-IN      Konkani (India)
                ky          Kyrgyz
                ky-KG       Kyrgyz (Kyrgyzstan)
                lt          Lithuanian
                lt-LT       Lithuanian (Lithuania)
                lv          Latvian
                lv-LV       Latvian (Latvia)
                mi          Maori
                mi-NZ       Maori (New Zealand)
                mk          FYRO Macedonian
                mk-MK       FYRO Macedonian (Former Yugoslav Republic of Macedonia)
                mn          Mongolian
                mn-MN       Mongolian (Mongolia)
                mr          Marathi
                mr-IN       Marathi (India)
                ms          Malay
                ms-BN       Malay (Brunei Darussalam)
                ms-MY       Malay (Malaysia)
                mt          Maltese
                mt-MT       Maltese (Malta)
                nb          Norwegian (Bokm?l)
                nb-NO       Norwegian (Bokm?l) (Norway)
                nl          Dutch
                nl-BE       Dutch (Belgium)
                nl-NL       Dutch (Netherlands)
                nn-NO       Norwegian (Nynorsk) (Norway)
                ns          Northern Sotho
                ns-ZA       Northern Sotho (South Africa)
                pa          Punjabi
                pa-IN       Punjabi (India)
                pl          Polish
                pl-PL       Polish (Poland)
                ps          Pashto
                ps-AR       Pashto (Afghanistan)
                pt          Portuguese
                pt-BR       Portuguese (Brazil)
                pt-PT       Portuguese (Portugal)
                qu          Quechua
                qu-BO       Quechua (Bolivia)
                qu-EC       Quechua (Ecuador)
                qu-PE       Quechua (Peru)
                ro          Romanian
                ro-RO       Romanian (Romania)
                ru          Russian
                ru-RU       Russian (Russia)
                sa          Sanskrit
                sa-IN       Sanskrit (India)
                se          Sami (Northern)
                se-FI       Sami (Northern) (Finland)
                se-FI       Sami (Skolt) (Finland)
                se-FI       Sami (Inari) (Finland)
                se-NO       Sami (Northern) (Norway)
                se-NO       Sami (Lule) (Norway)
                se-NO       Sami (Southern) (Norway)
                se-SE       Sami (Northern) (Sweden)
                se-SE       Sami (Lule) (Sweden)
                se-SE       Sami (Southern) (Sweden)
                sk          Slovak
                sk-SK       Slovak (Slovakia)
                sl          Slovenian
                sl-SI       Slovenian (Slovenia)
                sq          Albanian
                sq-AL       Albanian (Albania)
                sr-BA       Serbian (Latin) (Bosnia and Herzegovina)
                sr-BA       Serbian (Cyrillic) (Bosnia and Herzegovina)
                sr-SP       Serbian (Latin) (Serbia and Montenegro)
                sr-SP       Serbian (Cyrillic) (Serbia and Montenegro)
                sv          Swedish
                sv-FI       Swedish (Finland)
                sv-SE       Swedish (Sweden)
                sw          Swahili
                sw-KE       Swahili (Kenya)
                syr         Syriac
                syr-SY      Syriac (Syria)
                ta          Tamil
                ta-IN       Tamil (India)
                te          Telugu
                te-IN       Telugu (India)
                th          Thai
                th-TH       Thai (Thailand)
                tl          Tagalog
                tl-PH       Tagalog (Philippines)
                tn          Tswana
                tn-ZA       Tswana (South Africa)
                tr          Turkish
                tr-TR       Turkish (Turkey)
                tt          Tatar
                tt-RU       Tatar (Russia)
                ts          Tsonga
                uk          Ukrainian
                uk-UA       Ukrainian (Ukraine)
                ur          Urdu
                ur-PK       Urdu (Islamic Republic of Pakistan)
                uz          Uzbek (Latin)
                uz-UZ       Uzbek (Latin) (Uzbekistan)
                uz-UZ       Uzbek (Cyrillic) (Uzbekistan)
                vi          Vietnamese
                vi-VN       Vietnamese (Viet Nam)
                xh          Xhosa
                xh-ZA       Xhosa (South Africa)
                zh          Chinese
                zh-CN       Chinese (State)
                zh-Hans     Chinese (Simplified Han Script)
                zh-Hant     Chinese (Traditional Han Script)
                zh-HK       Chinese (Hong Kong)
                zh-MO       Chinese (Macau)
                zh-SG       Chinese (Singapore)
                zh-TW       Chinese (Taiwan)
                zu          Zulu
                zu-ZA       Zulu (South Africa)
                

                This new regex version matches all the possible language codes :

                SEARCH / MARK (?-i)(?<=\x20lang=")(?:zh\-Han(s|t)|\l{2,3}(-\u{2})?)(?=">?)

                Now, in order to omit the two "fr" and "fr-FR" languages, only, prefer the regex, below :

                SEARCH / MARK (?-i)(?<=\x20lang=")(?:zh\-Han(s|t)|(?!fr(-FR)?">?)\l{2,3}(-\u{2})?)(?=">?)

                You may test these two regexes against the list above !

                Best Regards,

                guy038

                Alan KilbornA 1 Reply Last reply Reply Quote 1
                • Pouemes44P
                  Pouemes44
                  last edited by

                  thanks to all
                  yes i am trying to find in my pages the iso with two letters which refer generally to ISO 693-1

                  because i think i havesome mistakes

                  example lingala
                  ISO 639-1 ln
                  ISO 639-2 lin
                  ISO 639-3 lin
                  IETF ln

                  not easy to know which language code

                  http://www.language-archives.org/language/lin
                  and
                  https://www.ethnologue.com/language/lin

                  use ISO 693-3

                  1 Reply Last reply Reply Quote 0
                  • Alan KilbornA
                    Alan Kilborn @guy038
                    last edited by

                    @guy038

                    Well… You can read into a poster’s request as much as you want, and go off and research a poster’s problem, again, as much as you want. :-)
                    I’m sure there might be some interesting “finds” along such a journey.

                    I don’t mind helping with regex requests (except from the “takers”), but I’m sticking to what is asked for, and I’m not going to infer a bunch of stuff. My goal is “get them on their way” quickly. Just my take on it.

                    Also, if we solve the problem they ask for, and it isn’t the problem they have, perhaps they learn to be better askers?

                    But, I didn’t exactly solve the problem that was asked for: “find all lines…”. Really then the hit should have consisted of a full line, right? Well, we have some wiggle room here, as a “Find All…” search provides the whole line data requested.

                    1 Reply Last reply Reply Quote 0
                    • guy038G
                      guy038
                      last edited by

                      Hi, @pouemes44, @alan-kilborn, @astrosofista and All,

                      @pouemes44, the last regex, of my previous post, finds any language code, of 2 or 3 lowercase letters, optionnally followed with a dash and a country code of two uppercase letters, different from, either, "fr" and "fr-FR" OR finds the specific zh-Hans and zh-Hant Chinese syntaxes


                      Now, if we assume, as a matter of principle, that the language codes are all correct, in your files, the search of these language codes are more simple ! Indeed, as no control about syntax is needed, this regex, below, should be enough ( The two language codes "fr" and "fr-FR" are not taken in account ! )

                      SEARCH / MARK (?-is)(?<=\x20lang=")(?!fr"|fr-FR").+?(?=">?)


                      And, as @alan-kilborn said, if you prefer to highlight the entire lines, with their EOL chars, containing a language code, use that regex :

                      MARK (?-is)^.*\x20lang="(?!fr"|fr-FR").+\R?

                      Which looks for entire lines, EOL included, containing, at least, a space char, followed by a string lang", with this case, and followed with a valid language code, different from, either, "fr" and "fr-FR"


                      Finally, if you just need to bookmark the lines containing a global HTML attribute lang"..........", containing a valid language code, different from, either, "fr" and "fr-FR", use this final regex :

                      MARK (?-i)\x20lang="(?!fr"|fr-FR")

                      BR

                      guy038

                      1 Reply Last reply Reply Quote 0
                      • Pouemes44P
                        Pouemes44
                        last edited by

                        Great thanks Guy for all you explanations which are precious, and could be precious to next search
                        Thanks Alan too

                        ***Is there someone here who could be able to make a working extention like “toolbucket” able to search and replace in folders… it should be a great extention

                        Alan KilbornA 1 Reply Last reply Reply Quote 0
                        • Alan KilbornA
                          Alan Kilborn @Pouemes44
                          last edited by

                          @Pouemes44 said in find two letters between quotes in lang tag:

                          Is there someone here who could be able to make a working extention like “toolbucket” able to search and replace in folders… it should be a great extention

                          Have you tried Replace in Files on the Find in Files tab of the Find window?

                          1 Reply Last reply Reply Quote 0
                          • Pouemes44P
                            Pouemes44
                            last edited by Pouemes44

                            Hello Alan yes of course
                            but when i must search and replace multi lines, its not very easy with a sow little windows and must always use regular expresions for lines break, so with and extension it will be super

                            1 Reply Last reply Reply Quote 0
                            • First post
                              Last post
                            The Community of users of the Notepad++ text editor.
                            Powered by NodeBB | Contributors