Editing 600 mega XML file
-
@Maor-Bachar said:
seem like it doesn’t have XML style?
I think they unfortunately restrict extra coloring schemes to their paid version: http://www.editpadpro.com/cscs.html
-
What size XML can you load in N++ ?
-
I don’t think file type matters, but…
If I have no plugins and no other file opened, I can open files with maximum size around 362MB, before it displays “File is too big to be opened” error.
But with my plugins enabled, the limit lowers to about 150MB. -
Hello, for people needing a x64 build of Notepad++ I made one here.
Notes:
- This is an easy unofficial build – not tested by devs.
- You need 64-bit OS.
- It is not compatible with 32-bit plugins (ie. all available plugins). You need to rename or move to other place your
/plugins
folder. - It can open very large files. Important: make sure “Word Wrap” is disabled before opening such files.
- The 7z package above contains only the binaries you need to replace. Please do a backup of your current 32bit files (rename or move).
-
I really think that anyone requiring to edit a 600MB XML file should be thinking of alternatives! I mean those files contain structures, and you really need to use something that will respect that structure - which isn’t a text editor.
I don’t know what this file contains, but it might be useful to arrange that it was stored as a number of much smaller files. Alternatively, I guess you could load its structure into Mathematica, manipulate it, and write it out again!
I don’t like to see a tool like NP++ being pushed to perform basically ridiculous tasks.
David
-
hi david
excuse me.
be happy, that You never had to edit a 10GB-xml-file.
therefor notepad++ isn’t NOT Your friend ;-)
but, I want to give y try to ricardo’s 64bit edition! it’s new to me ;-)
Yours klaus -
Klaus,
“be happy, that You never had to edit a 10GB-xml-file.”
I am indeed - but I am also happy that I have never tried to solder electronic components with a blow torch, or fry an egg on a smoothing iron! Even if I had, I wouldn’t try to make suggestions on a blow torch forum regarding improvements to blow torches that might make that more feasible!
I think a lot of software gradually bloats out - both in bytes and in terms of complexity. Ultimately valuable software can die that way! I think NP++ has remained consistently focussed on providing solutions for those who need to edit normally sized text files - particularly program source - and trying to add on outlandish capabilities would be a severe distraction.
Switching to 64-bits would probably only be the first step in a project to edit 10 GB data files, because I am sure there must be many processes inside NP++ that depend on being able to scan across an entire file in a sensible amount of time.
It might be more constructive if you described how you got into this mess, and someone might be able to offer some constructive suggestions!
As a preliminary suggestion, I would suggest that you write a C program to read the file, and recognise the data you want to change. You could then modify the program to actually perform a change (but I would back up your file before you start :) ).
David
-
woot
For editing a 10GB file, I guess the best would be an editor that doesn’t load the entire file into memory, but reads it direct from disk. Otherwise, you would need a lot of RAM. -
Hello Maor Bachar and All,
Why don’t you give a try to the old, but excellent, script program GAWK.exe, which can be used, in addition, both, on Unix or Windows machines (with the appropriate executable file, of course ) ?
You can get the latest Windows version of gawk.exe, AFAIK, ( v4.1.0 ), at the address, below :
https://code.google.com/p/gnu-on-windows/downloads/detail?name=gawk-4.1.0-bin.zip&can=2&q=
To get an overview of the main features of gawk and what’s new in the v4.1.0 version, follow the link :
http://www.drdobbs.com/open-source/gnu-awk-this-is-not-your-fathers-awk/240158351
And, here is, below, the link to download the last reference book, on GAWK, v4.1.x, in various formats :
http://www.gnu.org/software/gawk/manual/
The PDF form is quite recent : April 2015 !
To test it, I created a 1 Gb file. ( I didn’t create a 10 Gb file, as my old Win XP laptop, with only two 40Gb partitions would not accept it ! ). But, when I go back to work , as I’m presently, on holidays, I’ll be able to build bigger files !
Briefly, I first created, in N++, a 200 Mb file, about, named xxx.txt. Then, I recopied this file, five times, to the All.txt file, in a DOS windows, with the commands :
type xxx.txt>All.txt type xxx.txt>>All.txt type xxx.txt>>All.txt type xxx.txt>>All.txt type xxx.txt>>All.txt
I ended the All.txt file with a last line
echo END of the FILE>>All.txt
Length of the lines are from 0 to 150. Finally, the All.txt file contains, about, 18 498 000 lines, for 1026 Mo bytes !
I used the simple gawk script below, named Script.txt. It’s IMPORTANT to note that this file is ANSI encoded
#----------------------------------------------------------------------------------------------------------------# # SYNTAX to RUN, in a CMD windows, from the FOLDER, where are the 3 files GAWK.exe, Script.txt and All.txt # # ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯¯¯¯¯¯¯¯ ¯¯¯¯¯¯¯¯¯¯ ¯¯¯¯¯¯¯ # # # # SYNTAX : # # ¯¯¯¯¯¯ # # # # echo [Regex|String]|gawk -f Script.txt - All.txt[ File2.txt[ File3[ ...]]][ >[>]See.txt] # #----------------------------------------------------------------------------------------------------------------# { if ( pattern == "" ) { #---------------------------------------------------------------------------# # The STANDARD INPUT - is read, BEFORE the USER FILE(S), so ALL the TEXT # # of the DOS COMMAND 'echo' will be STORED in the VARIABLE 'pattern' # # # # Note : No BLANK, at END of "echo" command, unless part of the REGEX # # # # IF the PATTERN is NOT initialized : # # SET the VARIABLE 'total' to the value 0 # # SET the VARIABLE 'pattern' to the value of the LINE field $0 # # = ALL the TEXT of the DOS COMMAND 'echo' # # SKIP to the NEXT line to READ # #---------------------------------------------------------------------------# total = 0 ; pattern = $0 next } } #------------------------------------------------------------------------------# # IF the CURRENT line MATCHES the PATTERN of the VARIABLE 'pattern' : # # INCREMENT, by ONE, the VARIABLE 'total' [ and PRINT the CURRENT line ] # # SKIP to the NEXT line to READ, in ALL cases # #------------------------------------------------------------------------------# { if ($0 ~ pattern) { ++total } # OR if ($0 ~ pattern) { ++total ; print } next } #-------------------------------------------------------------------------------------------------------------------# # AFTER ALL the USER files are READ, DISPLAYS the NUMBER of lines, MATCHING the PATTERN of the VARIABLE 'pattern' # #-------------------------------------------------------------------------------------------------------------------# END { print "\n Number of lines MATCHING \x22" pattern "\x22 : " , total }
The command
echo .*|gawk -f Script.txt - all.txt
give me the number 18 498 231, which is the number of lines matching the regex .*The command
echo END of the FILE|gawk -f Script.txt - all.txt
give me the number 1, noticing that it correctly read the last and 18 498 231th line, of the 1 Gb file All.txt !On my old Win XP laptop, with, only, 1GB RAM ( 2 * 512 Mo ), I got the result, from each command, in 1mn 30s about ! Not to bad, isn’t it ?
Of course, my script just count the matchings but, with GAWK, you can manipulate files, in multiple ways, do calculus or searches/replacements or … It’s a very powerful tool, although tiny in size : 223 246 bytes, for the v4.1 version.
You’ll just have to scan, a bit, the gawk documentation !
And, as Ricardo said, it seems, to directly read files, from disk :-) => No limit in size, seemingly
Best Regards,
guy038
-
@Ricardo
I think crisp (64bit) can do it!
but it costs 100-250 per Year!and for my biggest files (with rubbish-xml-gore) in 10GB files, I must to use: VEDIT Pro (64bit)
attention: high prize! $240 (no joke!)Yours klaus