Help with Regex replace in one step



  • Hi All, Please I need help with regex replace in one step to get the below output from the given input. This is just a random URL example I got.
    Input:
    https://www.gnu.org/software/softwaremanual/html_node/emacs/Regexp-Replace.html

    Required Output:
    https://www.gnu.org/software:softwaremanual.html_node.emacs:softwaremanual.html_node.emacs:Regexp-Replace.html*;

    (:softwaremanual.html_node.emacs) - this bit repeats again in the output after replacing / with .

    At the moment, I am doing this in three steps:
    Step:1
    Regex:
    ((.*)/software/)|[\w]+\K(/)
    Replace with: \1.

    Step:2
    Regex:
    software/.
    Replacewith: software:

    Step: 3
    Regex:
    :(\w.+emacs).(.)
    Replace with: :\1:\1:\2
    ;



  • @Ramesh-G-0 said in Help with Regex replace in one step:

    Hi All, Please I need help with regex replace in one step to get the below output from the given input

    Hi. Is this homework? Or a job interview question?

    Because if you really just want to to translate your data from one form to another, doing it in two or three steps can often be easier to understand than creating a super-complicated regular expression.

    If you just want to learn, that’s great; but you won’t actually learn if we solve the problem for you.

    If I were trying to figure out how to solve that problem, and had the knowledge that you apparently have (knowing how to put text into capture groups and knowing that .* would match 0 or more characters), I would probably experiment with joining the individual pieces together with the .* between them, and then playing around until I got it right. So instead of my doing that iterative process to figure it out, and then giving you the “answer”, which you then wouldn’t understand, I will just give that hint, and assure you that you’ll understand the final answer a lot better if you muddle through trying to join the pieces rather than just taking the answer that one of us hands you.



  • Hi Peter, thanks for the response. That was a one off work related stuff I needed. I managed to solved in three steps and got the work done. But my curiosity didn’t let me sleep and wanted to do it more efficiently in one step for my own learning.



  • @Ramesh-G-0

    For the exact data you showed,

    • FIND = (https://[\w\.]+/software)/(\w+)/(\w+)/(\w+)/([^\s/]*)
    • REPLACE = $1:$2.$3.$4:$2.$3.$4:$5;

    If you are uncertain about the number of directory levels in the original URL, there are ways to do it, but it gets so complicated (lots of conditional replacements) that I wouldn’t recommend it – completely generic isn’t probably possible, but I would just do N levels, where I was confident that my data would never need more than N levels to accomplish it.



  • The directly level does vary for every URL. I will work on your direction and see if I could solve. Appreciate your help.


Log in to reply