XML Tools Plugin - Prettify & Check xml inside CDATA?

Jorge VT

I have a format of XML file that essentially contains XML data inside CDATA tags.

Example:

(but tens of thousands of lines)

Are there any settings I could set in the XML Tools plugin that would Check (and preferably prettify) the XML data inside the CDATA[] tags as well as the actual wrapping XML?

Jorge VT

Not that it needs to be more complicated, but not all the text in the CDATA[] fields is XML. Some is just plain text and variables.

PeterJones

My guess is that the answer is no, though I’m not an XML Tools Plugin guru (in fact, I rarely use it).

However, as an outsider’s suggestion, I would try the following three step process

Search and Replace (non-regular-expression) <![CDATA[ to ∠∠, and ]]> to ⟂⟂ (where ∠∠ and ⟂⟂ were chosen as character sequences not likely in your document; if they are, pick some different delimiter
Since the CDATA markers are out of the way, XML Tools should validate and/or prettify the internal contents along with the structured
Search and Replace ∠∠ back to <![CDATA[, etc.

You might be able to get away with <was:CDATA> and </was:CDATA>, though I don’t know whether that would validate properly; I am not an XML expert.

Failing that, I would use a real XML tool to extract the CDATA; then run that extracted text through the prettify/validate; then repopulate the CDATA contents with the results. (less automatic, less fully contained in Notepad++/XML-Tools, but possibly easier to implement, if my three-step doesn’t work).

Personally, if the search/replace wasn’t sufficient, I would do that whole process in a single Perl script, using one of the XML modules in CPAN, like XML::LibXML or XML::Twig for the CDATA extraction and insertion, and piping through some external executable to get the prettification. Others here could probably recommend equivalent Python modules/packages/whatever-they’re-called. One benefit of a Python solution would be that you might be able to do it fully within the PythonScript plugin for Notepad++, so keeping it more contained.

Ooh, I just thought of something: Can XML Tools be told to just prettify or validate a particular selection, rather than the whole document? If so,

search for something like this regular-expression: (?s)<!\[CDATA\[\K.*?(?=\]\]>) (that appears to highlight the contents of an individual CDATA section
apply the XML Tools prettify or validate on just that selection
find-next and repeat on the next selection, ad infinitum

-----
update: by “regular expression”, I mean enable the ☑ Regular Expression option in the find or replace dialog window; by “non-regular-expression”, I mean enable the ☑ Normal option.

PeterJones

As a follow on, I found a couple minutes to play a bit with the pretty-print and validate: the pretty-print does seem to work on a selection; the validate seemed to use the whole document, even when I just had the small segment selected. Though since I’m not good at XML, I could be wrong in my interpretation.

However, while the XML Tools pretty-print does work with individual selections, I had forgotten you said there may be thousands, which means it won’t be very efficient. I think the three-step search+replace/xml tools/search+replace is probably the best in-Notepad++ option.