Complex regex substitution
-
I’m trying to create a complex regex substitution, to find stuff like:
Improved Grapple KEY:Bloodline Feat ~ Improved Grapple CATEGORY:Internal TYPE:BloodLineAdditional PREABILITY:1,CATEGORY=FEAT,Improved Unarmed Strike PREMULT:1,[PREVARGTEQ:PreStatScore_DEX,13],[PREVARGTEQ:FeatDexRequirement,13] ABILITY:Feat|AUTOMATIC|Improved Grapple
and make it become similar to:
Blind-Fight KEY:Sorcerer Bloodline Feat ~ Blind-Fight CATEGORY:Special Ability TYPE:SorcererBloodlineFeat.DraconicBloodline.InfernalBloodline PREVARGTEQ:SorcererBloodlineFeatBlindFight,1 ABILITY:FEAT|AUTOMATIC|Blind-Fight
i’m trying to use regex to solve this as i have to search a large group of files (24 files). However.
the differences are:
- bloodline feat has to become Sorcerer bloodline feat
- category:internal has to become category:special ability
- Type changes completely
- preability and prestat i suppose should be kept but i’m not sure.
- PREVARGTEQ:SorcererBloodlineFeatXXX must be added
- ability had to be kept.
the difficulties come from the following:
- the [TAB]s seem to be random and not in the same number each time, which is puzzling (the coder told me that it’s just so that when he looks at it stuff is on the same column… he does not wrap up text… however that is not right either as i see many instances of text not being on the same column). Quoting here removes nearly all the [TAB]s so i’m not sure how to show you guys. [EDIT: ok it seems adding a [TAB] at the beginning allows this forum to understand it need to be code and thus the tabs are shown again]
- I’m still trying to figure out how to replace specific pieces of the text as regex for Notepad++ is very powerful but seems inconsistent in what it considers a variable piece of text, in theory “$1” is variable 1 “$2” is variable 2 etc etc etc.
- notepad++ regex variable names sometimes overlap with the substitutions to use, as it takes an “$1,1” (meaning “add variable 1 then add a comma and a 1”) as "add variable 1 and then 1 again, but only sometimes, somewhere else it adds the “comma 1” and if i put a “/,/1” which in regex means “consider the comma and the 1 as actual characters” sometimes it ignores the escape chartacters.
i’m trying to dell it that if i get
Improved Bull Rush KEY:Bloodline Feat ~ Improved Bull Rush
it should become
Improved Bull Rush KEY:Sorcerer Bloodline Feat ~ Improved Bull Rush
Just to make a simple example. So i should just put that “sorcerer” in there and when i try and get regex to do it… it would either come out as
$1 KEY:Sorcerer Bloodline Feat ~ $1
which is wrong on so many levels… or it comes out as
Improved Bull Rush KEY:Sorcerer Bloodline Feat ~ $1
which is outright weird, and that’s just an example, of what’s going wrong. I’m setting full regex but i seem to be an idiot.
-
What did you do to get the cool markdown in the first 2 code blocks (nice scroll bars). Is it just a matter of having really long text in a standard, 4-space-indent code block?
-
uhhh. i just copied the code and put a [TAB] before it… but do you have any insight on how to help me?
-
Well, I would offer some advice, but your posting is so disjointed and full of so many ideas/comments/other-stuff that I wouldn’t know where to start. First things first, though, and that is, this forum isn’t here to help you with content which seems to be the bulk of your needs. If you have a very specific question about a very specific regular expression and there is a tie-in with the flavor of the regex engine that Notepad++ uses, then that’s ideal. Anything less than that and we are moving away from the point of this forum. There is some general tolerance for generic regex questions here, however.
Can you break it down and ask it one piece at a time? Maybe one piece per posting? Get an answer (maybe not from me), to your most important issue (honestly I can’t tell what that is from your mega posting) then absorb the results into your situation and if you are still stuck then ask your next question, etc…
@Alan-Kilborn : My guess is that that is how really long lines are represented in markdown. Not sure what the threshold length is…
-
@Scott-Sumner hi there.
i have to find a regular expression which matches the first and while keeping it as similar as possible similar to the first, add/modify all the stuff that is present in the second.
- The files in which i have to search for the stuff present a lot of very similar situations, so i cannot (for example) search all “category:internal” to make them become “category:special ability” nor search “bloodlineadditional” to make them become “sorcererbloodlinefeat” as two of the intended changes seen in the examples provided.
- it seems i have a grasp that is too lose of what regex does in notepad++ thus when i tell regex to substitute a piece with $1 to put “improved grapple” in there as the “first thing that (should) get matched” i end up in the resulting substitution with a “$1” instead or “improved” or other weird result, which seem to vary from spot to spot along all the files.
- the [TAB]s have to be kept and no matter what i write in the regex it never matches the same number of [TAB]s.
- if the initial string is “bull-rush” and i try and get a result which says “bull-rush,1” i thought i should write “$1,1” but then i either get “bull rushbull rush” or i get “$1,1” or i even got at least once something akin to “$1bull rush” if i try and write the substitution as “$1/,/1” to make regex understand that the comma and the 1 are characters, not variables, it ignores the escaping of the comma and the one. and acts as above without telling me what i’m doing wrong.
Basically i got no clue why it is not working or what i’m doing wrong and i’m asking for help.
-
soo… apparently this:
(.*?\t*?KEY:)(Bloodline Feat.*?\t*?CATEGORY:)Internal(\t*?TYPE:).*?(\t*?PRE.*)
should give me the right results in the initial rearch, but when i try and see if $1, $2, $3 are correct when substituting… they seem not.
-
I hate to be a Debbie-downer, but I don’t think a regex is the right tool for this job. The examples you’re giving look like some kind of JSON object. I would guess the number of tabs and spaces define where each property lives in the object. For example the “Improved Grapple” object might look something like this in JSON:
"Improved Grapple":{ KEY:"Bloodline Feat ~ Improved Grapple" CATEGORY:Internal TYPE:BloodLineAdditional PREABILITY:[{"CATEGORY":{FEAT:"Improved Unarmed Strike"}}] PREMULT:[{PREVARGTEQ:{PreStatScore_DEX:13}}, {PREVARGTEQ:{FeatDexRequirement:13}}] ABILITY:"Feat|AUTOMATIC|Improved Grapple" }
You can install the JSTool plugin and highlight those lines and open them in the JSON viewer to get an idea of the structure of the object.
Since you’re changing properties, adding new ones, removing other ones, and just generally writing a new object, I would suggest you figure out how to read the files into an object and then manipulate the object in Python or JavaScript. Then when you have the object defined the way you want, you can export it back out into the original horrible format.
-
- it’s not json and the tabs are just there to allow spacing so that they “look good” on a single line when a bunch of them are all there at the same time. (basically they line up stuff vertically… and yes… that sounds idiotic… but i did not write the original “code” 😅)
- It cannot be on multiple lines as the intepreter of the code sees every new line as a different “item”.
- I did not write the interpreter, i’m just trying to help modify a bunch of items at the same time. Thus i cannot change the writing “language”.
- The task was to learn how to change multiple instances of similar things all at once so that in the future these things can be changed easily.