Community
    • Login

    regex: Match everything up to linebreak but not linebreak

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    13 Posts 3 Posters 4.2k Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Hellena CrainicuH Offline
      Hellena Crainicu
      last edited by

      I work only with notepad++, just running the code in Python.

      Neil SchipperN 1 Reply Last reply Reply Quote 0
      • Neil SchipperN Offline
        Neil Schipper @Hellena Crainicu
        last edited by

        @Hellena-Crainicu But you’re asking about a regex to feed into a call to re.findall(), correct? Or are you asking how to convert lines of text that look like your <title>..<\title> example that are in a text file loaded in the np++ editor?

        If it’s the latter, I have a solution but I’m confused.

        1 Reply Last reply Reply Quote 0
        • Hellena CrainicuH Offline
          Hellena Crainicu
          last edited by

          @Neil-Schipper I am using \w+ as you can see. But I need to stop selecting on the linebreak |, othewise I will get my-name-is-peter-prince-justin.html instead of my-name-is-peter.html

          Neil SchipperN 1 Reply Last reply Reply Quote 0
          • Neil SchipperN Offline
            Neil Schipper @Hellena Crainicu
            last edited by

            @Hellena-Crainicu I’m not getting the clarity I’m hoping for. Here are two very different things people do on computers:

            1. running a python program that processes an input file, and maybe changes it or produces an output file, etc.

            2. having a file loaded in an editor, and running a search and replace operation on it

            Which of these are you trying to do (that requires regex assistance as you described)?

            1 Reply Last reply Reply Quote 0
            • Hellena CrainicuH Offline
              Hellena Crainicu
              last edited by

              it is just about the regex… maybe @guy038 will can help me. He is the master of regex.

              1 Reply Last reply Reply Quote 0
              • Neil SchipperN Offline
                Neil Schipper
                last edited by

                For my own amusement, I solved the problem in the editor.

                I broke the problem into:

                1. consume from start line to first ‘>’
                2. capture everything up to and excluding (space followed by literal ‘|’) into group 1
                3. consume everything else up to and including EOL

                The search phrase ^.*?>(.+?)(?= \|).*?$ does this. Then replace with \1.html. Then a separate S&R can convert all spaces to ‘-’.

                But I still don’t know what you’re asking for, because you refuse to tell me!

                1 Reply Last reply Reply Quote 0
                • Neil SchipperN Offline
                  Neil Schipper
                  last edited by

                  Again, for my own amusement (since I’ve never used re.sub() before, only match & split):

                  >>> t1 = re.sub(r"^.*?>(.+?)(?= \|).*?$", r"\1.html", "<title>My name is Peter | Prince Justin (en)</title>")
                  >>> t2 = re.sub(r"\s", r"-", t1)
                  >>> t2
                  'My-name-is-Peter.html'
                  >>>
                  
                  1 Reply Last reply Reply Quote 0
                  • Hellena CrainicuH Offline
                    Hellena Crainicu
                    last edited by

                    I must split all html files, not just one. I don’t think I can use the replacement…

                        new_filename = title.get_text() 
                        new_filename = new_filename.lower()
                        words = re.findall(r'\w+', new_filename)
                        new_filename = '-'.join(words)
                        new_filename = new_filename + '.html'
                        print(new_filename)
                    
                    1 Reply Last reply Reply Quote 0
                    • Hellena CrainicuH Offline
                      Hellena Crainicu
                      last edited by

                      I try now this regex: \w+.*(?= \|)

                      words = re.findall(r"\w+.*(?= \|)", new_filename)

                      almost works, but I get: my name is peter.html (but without little dash)

                      1 Reply Last reply Reply Quote 0
                      • Alan KilbornA Offline
                        Alan Kilborn
                        last edited by

                        You guys are OFF-TOPIC.
                        This is not an appropriate place to discuss Python’s regular expression engine.
                        Please find a more appropriate forum for that and confine discussions here to Notepad++ related topics.
                        Just because you write Python code in Notepad++ doesn’t make discussion of that code a Notepad++ topic.

                        1 Reply Last reply Reply Quote 0
                        • Hellena CrainicuH Offline
                          Hellena Crainicu
                          last edited by Hellena Crainicu

                          I find the regex which I needed: \b\w+\b(?=[\w\s]+\|)

                          and in Python should be:

                          words = re.findall(r'\b\w+\b(?=[\w\s]+\|)', new_filename)

                          thanks @Neil-Schipper You give me a good ideea ;)

                          1 Reply Last reply Reply Quote 0

                          Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                          Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                          With your input, this post could be even better 💗

                          Register Login
                          • First post
                            Last post
                          The Community of users of the Notepad++ text editor.
                          Powered by NodeBB | Contributors