Community
    • Login

    regex: Match everything up to linebreak but not linebreak

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    13 Posts 3 Posters 2.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Hellena CrainicuH
      Hellena Crainicu
      last edited by

      I work only with notepad++, just running the code in Python.

      Neil SchipperN 1 Reply Last reply Reply Quote 0
      • Neil SchipperN
        Neil Schipper @Hellena Crainicu
        last edited by

        @Hellena-Crainicu But you’re asking about a regex to feed into a call to re.findall(), correct? Or are you asking how to convert lines of text that look like your <title>..<\title> example that are in a text file loaded in the np++ editor?

        If it’s the latter, I have a solution but I’m confused.

        1 Reply Last reply Reply Quote 0
        • Hellena CrainicuH
          Hellena Crainicu
          last edited by

          @Neil-Schipper I am using \w+ as you can see. But I need to stop selecting on the linebreak |, othewise I will get my-name-is-peter-prince-justin.html instead of my-name-is-peter.html

          Neil SchipperN 1 Reply Last reply Reply Quote 0
          • Neil SchipperN
            Neil Schipper @Hellena Crainicu
            last edited by

            @Hellena-Crainicu I’m not getting the clarity I’m hoping for. Here are two very different things people do on computers:

            1. running a python program that processes an input file, and maybe changes it or produces an output file, etc.

            2. having a file loaded in an editor, and running a search and replace operation on it

            Which of these are you trying to do (that requires regex assistance as you described)?

            1 Reply Last reply Reply Quote 0
            • Hellena CrainicuH
              Hellena Crainicu
              last edited by

              it is just about the regex… maybe @guy038 will can help me. He is the master of regex.

              1 Reply Last reply Reply Quote 0
              • Neil SchipperN
                Neil Schipper
                last edited by

                For my own amusement, I solved the problem in the editor.

                I broke the problem into:

                1. consume from start line to first ‘>’
                2. capture everything up to and excluding (space followed by literal ‘|’) into group 1
                3. consume everything else up to and including EOL

                The search phrase ^.*?>(.+?)(?= \|).*?$ does this. Then replace with \1.html. Then a separate S&R can convert all spaces to ‘-’.

                But I still don’t know what you’re asking for, because you refuse to tell me!

                1 Reply Last reply Reply Quote 0
                • Neil SchipperN
                  Neil Schipper
                  last edited by

                  Again, for my own amusement (since I’ve never used re.sub() before, only match & split):

                  >>> t1 = re.sub(r"^.*?>(.+?)(?= \|).*?$", r"\1.html", "<title>My name is Peter | Prince Justin (en)</title>")
                  >>> t2 = re.sub(r"\s", r"-", t1)
                  >>> t2
                  'My-name-is-Peter.html'
                  >>>
                  
                  1 Reply Last reply Reply Quote 0
                  • Hellena CrainicuH
                    Hellena Crainicu
                    last edited by

                    I must split all html files, not just one. I don’t think I can use the replacement…

                        new_filename = title.get_text() 
                        new_filename = new_filename.lower()
                        words = re.findall(r'\w+', new_filename)
                        new_filename = '-'.join(words)
                        new_filename = new_filename + '.html'
                        print(new_filename)
                    
                    1 Reply Last reply Reply Quote 0
                    • Hellena CrainicuH
                      Hellena Crainicu
                      last edited by

                      I try now this regex: \w+.*(?= \|)

                      words = re.findall(r"\w+.*(?= \|)", new_filename)

                      almost works, but I get: my name is peter.html (but without little dash)

                      1 Reply Last reply Reply Quote 0
                      • Alan KilbornA
                        Alan Kilborn
                        last edited by

                        You guys are OFF-TOPIC.
                        This is not an appropriate place to discuss Python’s regular expression engine.
                        Please find a more appropriate forum for that and confine discussions here to Notepad++ related topics.
                        Just because you write Python code in Notepad++ doesn’t make discussion of that code a Notepad++ topic.

                        1 Reply Last reply Reply Quote 0
                        • Hellena CrainicuH
                          Hellena Crainicu
                          last edited by Hellena Crainicu

                          I find the regex which I needed: \b\w+\b(?=[\w\s]+\|)

                          and in Python should be:

                          words = re.findall(r'\b\w+\b(?=[\w\s]+\|)', new_filename)

                          thanks @Neil-Schipper You give me a good ideea ;)

                          1 Reply Last reply Reply Quote 0
                          • First post
                            Last post
                          The Community of users of the Notepad++ text editor.
                          Powered by NodeBB | Contributors