Community
    • Login

    Filter using columns?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    8 Posts 6 Posters 2.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • ZNC GatilhoZ
      ZNC Gatilho
      last edited by

      Good morning. I marked in red in the image the columns that interest me in this document. This is the list of all chess players registered with FIDE, which is freely available to the public at this address (I can’t post the link because human beings love to complicate simple things!.) I would like to know if notepad++ can filter only what interests me using the columns? For example: I would like it to show me only players with standard ratings from 2100 to 2900, the column responsible for ratings is (SRtng) marked in red, this would be my first player filter, then I would like to filter only players titled, the column responsible for this is (Tit) marked in red in the document. fide provides the lists in xml and txt, I downloaded the txt, I downloaded the full version which is over 200mb. I don’t know if xml would be better for what I want to do, I don’t understand any of these things. Trying to filter by arm would be unfeasible, the list has almost a million names. Thank you for your attention. PS: I want to inform you that I don’t work with this, the list will be for my personal use, later I will create a database to study the matches of the players that interest me. I don’t understand anything about regular expressions, so any help, if it’s possible to do what I want, will have to be for someone who is completely new to the subject. alt text

      Alan KilbornA PeterJonesP 2 Replies Last reply Reply Quote 0
      • Alan KilbornA
        Alan Kilborn @ZNC Gatilho
        last edited by

        @ZNC-Gatilho said in Filter using columns?:

        I can’t post the link because human beings love to complicate simple things!

        No, you can’t post the link because human beings do DUMB things. The fact that you can’t post a link is a countermeasure to these dumb things.

        1 Reply Last reply Reply Quote 0
        • Alan KilbornA
          Alan Kilborn
          last edited by Alan Kilborn

          What you’re asking for is a rather ambitious ask for Notepad++ and/or the people here. What you have is a “searching” question, and thus you should read and follow the instructions HERE to give people interested in helping you a better starting point.

          1 Reply Last reply Reply Quote 1
          • PeterJonesP
            PeterJones @ZNC Gatilho
            last edited by PeterJones

            @ZNC-Gatilho said in Filter using columns?:

            This is the list of all chess players registered with FIDE, which is freely available to the public at this address (I can’t post the link because human beings love to complicate simple things!.)

            And the “human beings” who complicate things are the spammers who have inflicted non-stop AI/bot-based posts on our small community. Preventing accounts with 0 reputation from posting links has dramatically reduced the amount of spam we have to deal with. The best way to overcome that is to post one well-asked question, with textual data that’s easy to copy/paste, following the FAQ that @Alan-Kilborn linked you to before I finished writing this post: one such good post is usually enough to garner the single upvote necessary.

            And since the link is irrelevant to the question (unless someone wanted to download a 200mb file, just to help you), it’s not overly important to the discussion.

            However, you could have pasted about about a screen’s worth of text from the textfile – ie, the one screenshot worth of data from your image, but as text that we could copy/paste – using the </> button found in the create-a-post toolbar, as described in our formatting posts FAQ

            … which would have made it much easier for people to help you, without having to download a 200mb file.

            I would like to know if notepad++ can filter only what interests me using the columns?

            Any filter done in Notepad++ would be destructive – ie, removing data from the file. Make sure you have a backup before you try anything suggested in response to your query.

            For example: I would like it to show me only players with standard ratings from 2100 to 2900,

            Yeah, well, text editors aren’t great at numerical comparisons. That’s not what they’re designed for. The file you have is a text report from some database application, which isn’t the best for filtering after being generated.

            What would be best for you (though it wouldn’t involve Notepad++) is for FIDE to give you access to an API for the database itself, rather than just letting you look at reports. Second best (again, not using Notepad++) would be to parse the text file with some tool that would turn it back into a database on your end, and then use the tools with that database to do your extraction to create a new report with the filtered data that you want. Third best would be to import it into a spreadsheet – I know that Excel can take a textfile input, and then prompt you to manually insert the column divisions (and probably LibreOffice’s Calc can as well); once it’s in the spreadsheet, you can use the spreadsheet “filter” tool to do your filtering quite easily.

            Destructive search-and-replace in Notepad++ would be fourth in line (at best… there might be other better options between spreadsheet and Notepad++ search-and-replace that I cannot think of)… but it’s the only one that we can help you with here.

            I don’t know if xml would be better for what I want to do,

            In pure Notepad++, I’m doubtful that it would make it any easier – definitely not if you end up having to use regex, because parsing XML with regex is no easier than parsing fixed-width text table with regex. With the XML Tools plugin or other XML processing plugin, it might be better, if you knew the XPath syntax for searching (assuming XML Tools has an XPath searcher… I don’t honestly remember right now).

            I would think that a JSON file might be easier to search/replace than an XML file, but I cannot guarantee that. And one of the JSON plugins (JSON Viewer or JSON Tools) has a builtin filter/searching, which probably would be helpful. So if FIDE has a JSON version of the data, I might suggest that

            PS: … I don’t understand anything about regular expressions, so any help, if it’s possible to do what I want, will have to be for someone who is completely new to the subject

            If you are unwilling to learn, you are unlikely to find help here. If you are willing to learn, someone might be able to come up with a regex that will help. The way you’ve phrased this PS, I cannot tell if you are willing to learn or not.

            In Notepad++ proper, the only way to come close to your goal would be to use regular expressions. Though since one of your filters is mathematical in nature, I will direct you to our Math replacement FAQ for an explanation of why that won’t be simple,

            I’m betting that @Coises’s Columns++ plugin will be able to help you – unfortunately, it’s still in beta, so you’ll have to manually install it, since you cannot use Notepad++'s Plugins Admin to install the plugin. But that plugin has its own regex engine which might handle your columnar data better than a vanilla Notepad++ regex; and it has an added feature of mathematical notation (though I’m not sure if that only works in the replacement, or if it can also work for comparing a range of numbers in the search… @Coises will likely be around and will hopefully chime in).

            Since I’m not a Columns++ expert, if I were interested in doing this, I’d pull out the PythonScript plugin (if I was forced to do it in Notepad++) or my favorite scripting language (if I were doing it solely for my own use, not for this Forum) and I would write the script in Notepad++, but wouldn’t have to have the 200mb file open in Notepad++, since that’s pretty slow for N++.

            ----

            Useful References

            • Please Read Before Posting
            • Template for Search/Replace Questions
            • Formatting Forum Posts
            • Notepad++ Online User Manual: Searching/Regex
            • FAQ: Where to find other regular expressions (regex) documentation

            ----

            Please note: This Community Forum is not a data transformation service; you should not expect to be able to always say “I have data like X and want it to look like Y” and have us do all the work for you. If you are new to the Forum, and new to regular expressions, we will often give help on the first one or two data-transformation questions, especially if they are well-asked and you show a willingness to learn; and we will point you to the documentation where you can learn how to do the data transformations for yourself in the future. But if you repeatedly ask us to do your work for you, you will find that the patience of usually-helpful Community members wears thin. The best way to learn regular expressions is by experimenting with them yourself, and getting a feel for how they work; having us spoon-feed you the answers without you putting in the effort doesn’t help you in the long term and is uninteresting and annoying for us.

            ZNC GatilhoZ CoisesC 2 Replies Last reply Reply Quote 1
            • Mark OlsonM
              Mark Olson
              last edited by

              I could talk about ways you could use various plugins (including plugins I’ve made) to approach this problem, but IMO Notepad++ is not a good tool for this. Sometimes I’ve tried using NPP plugins to analyze or manipulate really huge tabular datasets like this, and in the end I found it easier to use the most popular and well-known tools.

              I personally would use Python and Jupyter Notebooks for this, and if you want any more details it has to be in chat because the details of a non-NPP solution are not pertinent to this forum.

              1 Reply Last reply Reply Quote 1
              • ZNC GatilhoZ
                ZNC Gatilho @PeterJones
                last edited by

                I thank Peter and the others who responded. I was going to detail a few more things, but I give up! There’s so much blocking that it’s impossible, congratulations to Akismet. The internet has become a paradise where the good pay for the bad. Since I lack computer knowledge, but I don’t lack money, I will resolve it in other ways.

                1 Reply Last reply Reply Quote 0
                • CoisesC
                  Coises @PeterJones
                  last edited by

                  @PeterJones said in Filter using columns?:

                  Coises will likely be around and will hopefully chime in

                  Perhaps I’m missing something… but from the screenshots @ZNC-Gatilho included, this isn’t very hard. The text file appears to be in fixed-width columns. No need for any plugin; just make a rectangular selection around the column of interest, sort, and then delete everything except the range of interest. A numeric sort will work on the SRtng column and a lexicographic sort on the others.

                  Obviously, make a copy of the file first.

                  1 Reply Last reply Reply Quote 3
                  • guy038G
                    guy038
                    last edited by guy038

                    Hello, @znc-gatilho, @alan-kilborn, @peterjones, @mark-olson, @coises and All,

                    @znc-gatilho, could you confirm me that your three important zones are, by priority order :

                    Zone 'SRtng' : Between columns 114 and 118  ( 5 chars )
                    Zone 'Tit'   : Between columns  77 and  79  ( 3 chars )
                    Zone 'WTit'  : Between columns  82 and  85  ( 4 chars )
                    

                    If so, let’s start with this INPUT text, containing the headers line 1 and 3 random records ( lines 626897, 626912 and 626917 )

                    ID Number      Name                                                 Fed Sex Tit  WTit OTit                   FOA SRtng SGm SK RRtng RGm Rk BRtng BGM BK B-day Flag
                    4109228        Kryukova, Irina                                      RUS F   WGN  WNM                             2232  0   20              2240  0   20 1968  wi
                    36016672       Kryvinski, Jenny                                     FRA F                                                                               2001  w
                    14102897       Kryvolapov, Kostiantyn                               UJR M   IM                                   2356  1   10              2327  0   20 1962
                    
                    • Open these four lines text in a new tab

                    • Move to the very beginning of your file ( Ctrl + Home )

                    • Open the Replace dialog ( Ctrl + H )

                    • Un-tick all box options

                    • Type in (?-s)^.{76}(...)..(....).{28}(.....).+ in the Find what : zone

                    • Type in \3 \1 \2 $0 in the Replace with : zone

                    • Select the Regular expression search mode

                    • Click on the Replace All button

                    => At once, you should get this temporary text :

                    SRtng Tit WTit ID Number      Name                                                 Fed Sex Tit  WTit OTit                   FOA SRtng SGm SK RRtng RGm Rk BRtng BGM BK B-day Flag
                    2232  WGN WNM  4109228        Kryukova, Irina                                      RUS F   WGN  WNM                             2232  0   20              2240  0   20 1968  wi
                                   36016672       Kryvinski, Jenny                                     FRA F                                                                               2001  w
                    2356  IM       14102897       Kryvolapov, Kostiantyn                               UJR M   IM                                   2356  1   10              2327  0   20 1962
                    
                    • Now, select the three lines, after the headers line

                    • Run the Edit > Line Operations > Sort Lines Lexicographically Ascending option

                    => You should get your expected OUTPUT text :

                    SRtng Tit WTit ID Number      Name                                                 Fed Sex Tit  WTit OTit                   FOA SRtng SGm SK RRtng RGm Rk BRtng BGM BK B-day Flag
                                   36016672       Kryvinski, Jenny                                     FRA F                                                                               2001  w
                    2232  WGN WNM  4109228        Kryukova, Irina                                      RUS F   WGN  WNM                             2232  0   20              2240  0   20 1968  wi
                    2356  IM       14102897       Kryvolapov, Kostiantyn                               UJR M   IM                                   2356  1   10              2327  0   20 1962
                    

                    As you can see your file is, automatically sorted by the three zones SRtng then by Tit then by WTit ascending

                    BTW, @Peterjones, I suppose that the SRtng zone is the ELO rating of chess players, which is ALWAYS a 4 number string for experienced players and grand masters ! . So, no trouble for the sort


                    Now, just repeat the steps above, with your players_list_FOA.txt file

                    Of course, do a backup of your players_list_FOA.txt, before any action

                    Beware that your file is really an huge file ( 216,785,860 bytes for 1,321,866 lines !! )

                    Thus, I cannot tell you how longer the search/replacement and the save operations will take till completion, when applied to THIS file !

                    However, you can give it a try !


                    Of course, in case that the zones do not belong to the mentioned ranges, just tell me the exact values in order to modify the search regex, accordingly :-)

                    Best Regards,

                    guy038

                    1 Reply Last reply Reply Quote 1
                    • First post
                      Last post
                    The Community of users of the Notepad++ text editor.
                    Powered by NodeBB | Contributors