Working on a Chinese Spam Guard Bot

michaellee8

I can bet $100 for those Chinese Spammers/Bot coming back when the 24 hr cooldown period expires, so I am developing a bot to fight against it here https://github.com/michaellee8/chinese-spam-guard

Maybe you guys can give me some input on this one, currently it runs a stateless detection engine based on character detection, it close those issues which consists of Chinese characters immediately as they are opened.

dinkumoil

@michaellee8

This is something @donho should read as he is the owner of the GitHub repo.

michaellee8

I have sent an email to him as well, maybe it was flooded into those spams. I would send another one after I have prepared a usable version.

michaellee8

I got a spam guard bot working here https://github.com/michaellee8/chinese-spam-guard, please have a look on it if you guys found it useful.

Palash Bansal

And the incident got listed on NPP wikipedia page too in controversies section.
Your spam detection bot seems like a good idea, but NPP supports chinese language too. so if there any issue in translation, a issue will need to be raised containing chinese characters, which will unnecessary get detected as spam. Better make a minimum limit of ~10 chinese characters to detect it as a spam.
Also, i saw some spams with only english too. It will be great if issue without “Debug info.” string gets detected as spam.

michaellee8

@Palash-Bansal Thank you, the bot close this issue automatically if either the title or body consist of more than 40% of Chinese characters currently. I guess validating the issue structure could be a good idea as well. I should implement it later.

dinkumoil

@michaellee8

Personally I think a spam guard bot based on the presence of certain characters in the content of an issue’s title or its text is an inappropriate measure. As pointed out by @Palash-Bansal it could produce too much false-positives.

The problem is not the real human beings posting comments, regardless of whether those comments are rude or offensive or threaten with violence (if someone decides to criticize China he has to be prepared for that). The real problem is the GitHub API that allows spam bots to open new issues every second (I saw that when the spam campaign was at its maximum).

This problem only can be solved by GitHub by improving their backend to prevent high-frequent commenting and issue creation. Spam comments produced by real humans have to be cleaned up manually. If someone doesn’t have resources for that, he should think twice to start criticizing China or other powerful parties.

Beyond that said, I guess that @donho might have allowed that escalation on purpose because he let it go for nearly 24 hours. When you google now for notepad++ china you will find a lot of websites publishing articles regarding that incident. In this way he could cause more people to be informed about the Uyghur Human Rights Project. Exactly that was the goal of the whole thing.

Alan Kilborn

@dinkumoil said in Working on a Chinese Spam Guard Bot:

I guess that @donho might have allowed that escalation on purpose because he let it go for nearly 24 hours. When you google now for notepad++ china you will find a lot of websites publishing articles regarding that incident. In this way he could cause more people to be informed about the Uyghur Human Rights Project. Exactly that was the goal of the whole thing.

Which probably means that we’ll have to endure more of this crap the next time @donho wants to leverage Notepad++ for political gain…

dinkumoil

@Alan-Kilborn said in Working on a Chinese Spam Guard Bot:

Which probably means that we’ll have to endure more of this crap the next time @donho wants to leverage Notepad++ for political gain…

Yes, that’s what I’m afraid of too. Turning Notepad++ into a political weapon affects the whole community:

Currently there is no progress in feature development/bug fixing, reporting bugs is impossible.
Moderators of this forum have to audit and unlock postings of unknown members.
I ask myself who did that “great” job of cleaning up the issue tracker.
It was nearly impossible to download v7.8.1 for two days.
When users manually check for availability of an update they only get a weird error message (It’s not a valid GUP xml). Thus I guess even after unlocking the download of v7.8.1 for the masses, people will not be able to get it because the autoupdate feature can not detect that there is an update available.

Alan Kilborn

I keep thinking that celebrities champion causes of their own liking, so why shouldn’t @donho exploit Notepad++ in such a manner, to draw attention to things he feels important. The difference here is that it is not @donho that is “famous”, but it’s his child. How many people even know that Don Ho is the author of something? I liken it to child-exploitation, and that’s wrong. The whole thing smells bad…and feels bad…and it would be nice if it were behind us. But it isn’t, because I’m sure more political causes/events are on their way…

michaellee8

@Alan-Kilborn I don’t really agree with that metaphor. I believe child-exploration is not ethical because a child is a living thing, is a human being, while this project is not a living thing, its a free produce with most effort done by Don, so I think he has the right to utilize it as whatever he thinks good.

michaellee8

@dinkumoil Actually that is not just an individual issue here, it is a generalized one. Every time pro-CCP nationalists found something they hate or being told to hate something, they attack it rudely, both physically and on the internet. I wouldn’t say it is good if we just sit there and don’t express out opinions if we are attacked and fear, rather I would found ways to beat it.

I suggest the following mechanism for an anti-spam bot:

For any second issue of the same user for the given period of time (e.g. 24 hrs), check for his/her activity log. If his/her activity log is empty/nearly empty (e.g. no contributions), issue permanent ban to it on our repo and all repo that uses this bot immediately, report it to GitHub, and then close/lock/delete all issues this user has ever made on our repo.
If 5 issues has been opened by the same user in 24 hours, such user should receive a 24-hr cooldown, the repo owner will also be able to issue permanent ban to him on his repo.

Could this method be good, it doesn’t do bans based on character recognization? Actually I knew a lot of forums/website is facing Chinese spams and I had thoughts of building a anti-spam mechanism for those.