Community
    • Login

    Working on a Chinese Spam Guard Bot

    Scheduled Pinned Locked Moved General Discussion
    12 Posts 4 Posters 970 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • michaellee8M
      michaellee8
      last edited by

      I can bet $100 for those Chinese Spammers/Bot coming back when the 24 hr cooldown period expires, so I am developing a bot to fight against it here https://github.com/michaellee8/chinese-spam-guard

      Maybe you guys can give me some input on this one, currently it runs a stateless detection engine based on character detection, it close those issues which consists of Chinese characters immediately as they are opened.

      dinkumoilD 1 Reply Last reply Reply Quote 1
      • dinkumoilD
        dinkumoil @michaellee8
        last edited by

        @michaellee8

        This is something @donho should read as he is the owner of the GitHub repo.

        1 Reply Last reply Reply Quote 2
        • michaellee8M
          michaellee8
          last edited by

          I have sent an email to him as well, maybe it was flooded into those spams. I would send another one after I have prepared a usable version.

          1 Reply Last reply Reply Quote 1
          • michaellee8M
            michaellee8
            last edited by

            I got a spam guard bot working here https://github.com/michaellee8/chinese-spam-guard, please have a look on it if you guys found it useful.

            dinkumoilD 1 Reply Last reply Reply Quote 0
            • Palash BansalP
              Palash Bansal
              last edited by

              And the incident got listed on NPP wikipedia page too in controversies section.
              Your spam detection bot seems like a good idea, but NPP supports chinese language too. so if there any issue in translation, a issue will need to be raised containing chinese characters, which will unnecessary get detected as spam. Better make a minimum limit of ~10 chinese characters to detect it as a spam.
              Also, i saw some spams with only english too. It will be great if issue without “Debug info.” string gets detected as spam.

              michaellee8M 1 Reply Last reply Reply Quote 0
              • michaellee8M
                michaellee8 @Palash Bansal
                last edited by

                @Palash-Bansal Thank you, the bot close this issue automatically if either the title or body consist of more than 40% of Chinese characters currently. I guess validating the issue structure could be a good idea as well. I should implement it later.

                1 Reply Last reply Reply Quote 0
                • dinkumoilD
                  dinkumoil @michaellee8
                  last edited by dinkumoil

                  @michaellee8

                  Personally I think a spam guard bot based on the presence of certain characters in the content of an issue’s title or its text is an inappropriate measure. As pointed out by @Palash-Bansal it could produce too much false-positives.

                  The problem is not the real human beings posting comments, regardless of whether those comments are rude or offensive or threaten with violence (if someone decides to criticize China he has to be prepared for that). The real problem is the GitHub API that allows spam bots to open new issues every second (I saw that when the spam campaign was at its maximum).

                  This problem only can be solved by GitHub by improving their backend to prevent high-frequent commenting and issue creation. Spam comments produced by real humans have to be cleaned up manually. If someone doesn’t have resources for that, he should think twice to start criticizing China or other powerful parties.

                  Beyond that said, I guess that @donho might have allowed that escalation on purpose because he let it go for nearly 24 hours. When you google now for notepad++ china you will find a lot of websites publishing articles regarding that incident. In this way he could cause more people to be informed about the Uyghur Human Rights Project. Exactly that was the goal of the whole thing.

                  1 Reply Last reply Reply Quote 3
                  • Alan KilbornA
                    Alan Kilborn
                    last edited by

                    @dinkumoil said in Working on a Chinese Spam Guard Bot:

                    I guess that @donho might have allowed that escalation on purpose because he let it go for nearly 24 hours. When you google now for notepad++ china you will find a lot of websites publishing articles regarding that incident. In this way he could cause more people to be informed about the Uyghur Human Rights Project. Exactly that was the goal of the whole thing.

                    Which probably means that we’ll have to endure more of this crap the next time @donho wants to leverage Notepad++ for political gain…

                    dinkumoilD 1 Reply Last reply Reply Quote 5
                    • dinkumoilD
                      dinkumoil @Alan Kilborn
                      last edited by dinkumoil

                      @Alan-Kilborn said in Working on a Chinese Spam Guard Bot:

                      Which probably means that we’ll have to endure more of this crap the next time @donho wants to leverage Notepad++ for political gain…

                      Yes, that’s what I’m afraid of too. Turning Notepad++ into a political weapon affects the whole community:

                      • Currently there is no progress in feature development/bug fixing, reporting bugs is impossible.
                      • Moderators of this forum have to audit and unlock postings of unknown members.
                      • I ask myself who did that “great” job of cleaning up the issue tracker.
                      • It was nearly impossible to download v7.8.1 for two days.
                      • When users manually check for availability of an update they only get a weird error message (It’s not a valid GUP xml). Thus I guess even after unlocking the download of v7.8.1 for the masses, people will not be able to get it because the autoupdate feature can not detect that there is an update available.
                      michaellee8M 1 Reply Last reply Reply Quote 4
                      • Alan KilbornA
                        Alan Kilborn
                        last edited by

                        I keep thinking that celebrities champion causes of their own liking, so why shouldn’t @donho exploit Notepad++ in such a manner, to draw attention to things he feels important. The difference here is that it is not @donho that is “famous”, but it’s his child. How many people even know that Don Ho is the author of something? I liken it to child-exploitation, and that’s wrong. The whole thing smells bad…and feels bad…and it would be nice if it were behind us. But it isn’t, because I’m sure more political causes/events are on their way…

                        michaellee8M 1 Reply Last reply Reply Quote 5
                        • michaellee8M
                          michaellee8 @Alan Kilborn
                          last edited by

                          @Alan-Kilborn I don’t really agree with that metaphor. I believe child-exploration is not ethical because a child is a living thing, is a human being, while this project is not a living thing, its a free produce with most effort done by Don, so I think he has the right to utilize it as whatever he thinks good.

                          1 Reply Last reply Reply Quote 0
                          • michaellee8M
                            michaellee8 @dinkumoil
                            last edited by

                            @dinkumoil Actually that is not just an individual issue here, it is a generalized one. Every time pro-CCP nationalists found something they hate or being told to hate something, they attack it rudely, both physically and on the internet. I wouldn’t say it is good if we just sit there and don’t express out opinions if we are attacked and fear, rather I would found ways to beat it.

                            I suggest the following mechanism for an anti-spam bot:

                            1. For any second issue of the same user for the given period of time (e.g. 24 hrs), check for his/her activity log. If his/her activity log is empty/nearly empty (e.g. no contributions), issue permanent ban to it on our repo and all repo that uses this bot immediately, report it to GitHub, and then close/lock/delete all issues this user has ever made on our repo.
                            2. If 5 issues has been opened by the same user in 24 hours, such user should receive a 24-hr cooldown, the repo owner will also be able to issue permanent ban to him on his repo.

                            Could this method be good, it doesn’t do bans based on character recognization? Actually I knew a lot of forums/website is facing Chinese spams and I had thoughts of building a anti-spam mechanism for those.

                            1 Reply Last reply Reply Quote 0
                            • First post
                              Last post
                            The Community of users of the Notepad++ text editor.
                            Powered by NodeBB | Contributors