Help with the construction of a regular expression
-
Hello. I can’t replace the necessary part of the text when searching using a regular expression.
The file is large. And I need to replace the part that repeats several times with another one while preserving the rest (different text between repetitions).
Moreover, the replacement should be made only after the fulfillment of the preceding condition.
#This is the beginning of the fragment of interest in the file. <component class="ship_m" macro="ship_arg_m_miner_solid_01_a_macro" connection="space" code="DGB-092" owner="player" knownto="player" variation="0" spawntime="16329.777" thruster="thruster_gen_m_combat_01_mk3_macro" id="[0x2141c69]"> <listeners> <listener listener="[0x2141c75]" event="boostdisabled"/> <listener listener="[0x2141c6e]" event="killed"/> <listener listener="[0x1cf2a2]" event="killed tempcomponentremoved"/> <listener listener="[0x2141c71]" event="killed"/> <listener listener="[0x2141c72]" event="killed"/> #Next, a bunch of lines. . . #The beginning of the interesting part of the text. <people> <person macro="character_argon_female_afr_crew_01_macro" role="service"> <npcseed seed="12947660160385706767"/> <skill type="engineering" value="6"/> <skill type="management" value="2"/> <skill type="morale" value="4"/> <skill type="piloting" value="2"/> </person> #there are different numbers of such parts in different ships. . . #this is how the section ends <person macro="character_argon_female_asi_pilot_01_macro" role="marine"> <npcseed seed="14903130213032596755"/> <skill type="boarding" value="2"/> <skill type="morale" value="3"/> <skill type="piloting" value="1"/> </person> </people> #after that, there are still a bunch of lines until the ship section ends
I need to replace the skill lines with the ones I need. At the same time, keeping the rest and only in the ships owned by the player.
At the moment, I have only thought of this option
(?'1'class="ship.+owner="player".+\r\n)(?'2'(.+\r\n)*?(<people>\r\n))\K((?'3'<person.+\r\n<npcsee.+\r\n)(?'4'<skill.+\r\n)*(?'5'</perso.+\r\n))*
This makes all the people stand out. But I can’t figure out how to replace only the lines with the skills. Or how to highlight only them.
-
@Graf said in Help with the construction of a regular expression:
This makes all the people stand out. But I can’t figure out how to replace only the lines with the skills. Or how to highlight only them.
It does seem as though your problem is complex. Possibly too complex to easily achieve by trying to replace using regular expressions alone.
But you did alude to being happy with just highlighting the relevant sections.
Whilst this isn’t the solution I might suggest you look at a post by our resident regular expression guru, @guy038. The post is located here.
As it stands it possibly won’t work for you, but maybe you can take some inspiration/guidance from it.
Good luck
Terry -
You may wish to use an XML parser rather than regular expressions, for reasons explained here.
If this is some ungodly document format that resembles XML but doesn’t comply with the specification, I don’t know what to say except that I pity you.
-
Thanks for the tips. Especially for the link with examples of text search in a separate zone. I broke my head, but I found my own option.
(?-si:<connection connection=".+\n.+class="ship.+owner="player".+\n|(?!\A)\G)(?s-i:(?!<\/people>\n).)*?\K(?-si:(<skill.+\n)+)
But another problem has emerged. as I understand it, due to the large file size (~500 MB and ~13.5 million lines) I get an error if there are about 1 million lines between the areas with the data I need. That is, for example, I search first
class="ship.+owner="player"
throughout the file and get a normal result - 32 matches in the entire file.But if I try to execute my regular expression from the beginning of the file, there will be a regular expression error message (invalid regular expression).
If i place the cursor inside any of the zones and use marks, then all the matches from all the zones will change if, as I found out, there are less than 1 million lines between them.
Or if i place the cursor inside the zone with the desired text (but after which, the next zone of interest will start after at least a million lines) and sequentially perform “find next”, then the search will continue normally until it reaches the </people> tag, after which, as I understand it, the search should begin the next zone, however, will again receive a regular expression error message.
I tried to find out what the error is on various online resources for checking regular expressions, but they can’t cope with such a volume of strings. I also did not find offline tools for such a task, maybe I was looking for it badly.
And as I understand it at the moment, the problem is not in the regular expression itself, but in the limitation of the search engine. Maybe someone has come across something like this or knows a suitable tool?
At the moment, I’m thinking about trying to write something suitable for this task in python.P.S. How can I hide anything under the spoiler?
P.P.S. X4 foundations let me go, I’ve been racking my head for the second week))