I was playing around with Lemmy statistics the other day, and I decided to take the number of comments per post. Essentially a measure of engagement – the higher the number the more engaging the post is. Or in other words how many people were pissed off enough to comment, or had something they felt like sharing. The average for every single Lemmy instance was 8.208262964 comments per post.
So I modeled that with a Poisson distribution, in stats terms X~Po(8.20826), then found the critical regions assuming that anything that had a less than 5% chance of happening, is important. In other words 5% is the significance level. The critical regions are the region either side of the distribution where the probability of ending up in those regions is less than 5%. These critical regions on the lower tail are, 4 comments and on the upper tail is 13 comments, what this means is that if you get less than 4 comments or more than 13 comments, that’s a meaningful value. So I chose to interpret those results as meaning that if you get 5 or less comments than your post is “a bad post”, or if you get 13 or more than your post is “a good post”. A good post here is litterally just “got a lot of comments than expected of a typical post”, vice versa for “a bad post”.
You will notice that this is quite rudimentary, like what about when the Americans are asleep, most posts do worse then. That’s not accounted for here, because it increases the complexity beyond what I can really handle in a post.
To give you an idea of a more sweeping internet trend, the adage 1% 9% 90%, where 1% do the posting, 9% do the commenting, and 90% are lurkers – assuming each person does an average of 1 thing a day, suggests that c/p should be about 9 for all sites regardless of size.
Now what is more interesting is that comments per post varies by instance, lemmy.world for example has an engagement of 9.5 c/p and lemmy.ml has 4.8 c/p, this means that a “good post” on .ml is a post that gets 9 comments, whilst a “good post” on .world has to get 15 comments. On hexbear.net, you need 20 comments, to be a “good post”. I got the numbers for instance level comments and posts from here
This is a little bit silly, since a “good post”, by this metric, is really just a post that baits lots and lots of engagement, specifically in the form of comments – so if you are reading this you should comment, otherwise you are an awful person. No matter how meaningless the comment.
Anyway I thought that was cool.
EDIT: I’ve cleared up a lot of the wording and tried to make it clearer as to what I am actually doing.
Look, I survived statistics class. I will stride to defend some of my post.
Namely that much of the aim of it was to show that an metric like comment count doesn’t imply that it was a good or bad post - hence the bizarre engagement bait at the end. And also why all of the “good posts” were in quotes.
I’m under the impression that whilst you can do a Hypothesis test by calculating the probability of the test statistic occurring, you can also do it by showing that the result is in the critical regions. Which can be useful if you want to know if a result is meaningful based on what the number is, rather than having to calculate probabilities. For a post of this nature, it makes no sense to find a p value for a specific post, since I want numbers of comments that anyone for any post can compare against. Calculating a p-value for an observed comment count makes no sense to me here, since it’s meaningless to basically everyone on this platform.
Truthfully I wasn’t doing a hypothesis test - and I don’t say I am in the post - although your original reply confused me - so I thought I was, I was finding critical regions and interpreting them, however I’m also under the impression that you can do 2 tailed tests, although I did make a mistake by not splitting the significance level in half for each tail. :(. I should have been clearer that I wasn’t doing a hypothesis test, rather calculating critical regions.
It doesn’t seem like you are saying I’m wrong, rather that my model sucks - which is true. And that my workings are weird - it’s a Lemmy post not a science paper. That said, I didn’t quite expect this post to do so well, so I’ve edited the middle section to be clearer as to what I was trying to do.
Well I appreciate the effort regardless. If you want any support in getting towards a more “proper” network analysis, I’ve dm’d you a link you can use to get started. If nothing else it might allow you to expand your scope or take your investigations into different directions. The script gets more into sentiment analysis for individual users, but since Lemmy lacks a basic API, the components could be retooled for anything.
Also, you might consider that all a scientific paper is, at the end of the day, is a series of things like what you’ve started here, with perhaps a little more narrative glue, and the repetitive critique of a scientific inquiry. All scientific investigations start with exactly the kind of work you are presenting here. Then you PI comes in and says “No you’ve done this wrong and that wrong and cant say this or that. But this bit or that bit is interesting”, and you revise and repeat.