site banner

Small-Scale Question Sunday for February 18, 2024

Do you have a dumb question that you're kind of embarrassed to ask in the main thread? Is there something you're just not sure about?

This is your opportunity to ask questions. No question too simple or too silly.

Culture war topics are accepted, and proposals for a better intro post are appreciated.

3
Jump in the discussion.

No email address required.

So reddit is selling data to AI training now according to Bloomberg. [Archive] . Well I for one isn't suprised! A huge bunch mediocre data nicely cleaned up by unpaid mods and community sentiment encoded in the karma. When I left reddit I removed all my posts and comments. I lost my trust way before the API debacle and seeing what is happening now, it just validates me. Anyone else?

While it's not a common sentiment, I agree with you. While it's likely ineffective at really removing your data from the database, mass overwriting and deleting your content is at the very least a middle finger to reddit that makes it worse to use, reducing its value to users and therefore to reddit itself. Trying to poison the dataset is even better.

The best defense, of course, is not to use the site at all. I heartily recommend that to everyone.

When I left reddit I removed all my posts and comments

I think some people have taken this even further. Intentionally poisoning the AI training data with false facts or doing SEO type shenanigans to get it to promote their products.

I don't understand the rationale behind this line of thought. If you are concerned(paranoid) about what data is used to train LLMs, you should also know that PushShift database exists, or that no production database out there in the wild doesn't have multiple replicas of itself at various timestamps.

Why do you not want your comments used to train LLMs anyways?

Why do you not want your comments used to train LLMs anyways?

well my concern isn't around LLMs in itself, it is slightly more abstract possible abuse of this data for behavior modification of crowds. I'm not an AGI doomer but I see the outlines already with inciting compulsive use with various apps like YouTube, TikTok and Instagram where it might be possible to use data like reddits to create similar compulsive loops for text as we have for video. I don't know if it is possible but it might be.

But my comment is more of that I made a decision a couple of years ago and this just proves that don't give a shit of the people who use their service, so I'm patting my own back.

Trust in what?

I don’t think AI training is any worse a purpose than advertising, which I assume is where they sent it beforehand.

Advertisers are not usually receiving data wholesale, it is a process of that the advertisement targets are determined by the data "owner" so they can take a portion of the ad sale. The whole cambridge analytical scandal was just that they mined Facebook data for personality profiles and bragg about behavior modification based on that data, despite it going against an actual contract with Facebook. Now in hindsight the media narrative was overblown and there where a bunch of claims that didn't turn out to be true. But that a company that wasn't supposed to keep the data and use it, kept the data and used it is true.

Trust in what?

That they are sensible people running the place. Look I was one of the 10 000 first users of Reddit. I was naive in thinking that it was run by people who wanted people on the internet build communities. My leaving predated everything with API blackout and this whole LLM mess. I saw admin crackdown on small communities that did nothing wrong and power mods bullying regular users accross multiple subreddits without the admins lifting a finger. The LLM sale is just aligned what I saw. They don't give a shit about the users of the site, I got wiser...

I have a hard time imagining why the early adapters of Reddit thought how it could ever be a good idea to consolidate the internet forum ecosystem into a single massive website. What was the perceived benefit here? These forums and communities already existed, Reddit didn’t even create them.

  • The primary software for forums was phpBB, and it was/is awful.
  • Reddit started as [del.ico.us](https://en.wikipedia.org/wiki/Delicious_(website)) with upvotes and comments
  • Early on it broke every news story. It was an incredibly addictive source of info
  • The early users found out about the site primarily through it's announcement on Paul Graham's essay section, so the early users were bright techies who liked to read
  • Subreddits were added later, they grew out of it's natural development

Consolidating the ecosystem wasn't a goal early on. It was just a source for good links that grew steadily.

If you weren't there it's hard for you to understand how slow and awful phpBB was. Old reddit's use of JS to update the DOM was the top of the tech at the time.

Digg was founded at about the same time as Reddit and had a more Slashdot inspired interface. It's design came off as more professional and it was seen as the larger website. Although Spez said the daily hits were about the same or larger on Reddit.

Digg had a terrible v4 redesign in 2010 that caused much anger. Users fled to Reddit and Digg never recovered.

The appearance of smartphones also played a role. Reddit added a json api early so there were apps on every platform. Even without them Reddit's minimalist design made it easier to build in phone support. I never actually tired, but those phpBB forums look like they'd be very hard to use on mobile. The UI doesn't look like it'd be useable on small screens without major work.

The primary software for forums was phpBB, and it was/is awful.

If you weren't there it's hard for you to understand how slow and awful phpBB was. Old reddit's use of JS to update the DOM was the top of the tech at the time.

I wasn't there depending on how old is the old Reddit you're referring to, I think I started browsing it around 2014 or so, but I don't think speed was that much a favorable comparison to phpBB, and even if it was originally, it's definitely not any more. Go to any still functioning phpBB forum (there are still a few out there), they're way faster than Reddit.

Reddit was founded in 2005, so the shared hosting php servers at the time were quite a bit slower than they are now. Also phpBB has probably added some javascript to avoid the full page refreshes on each click.

I'd go with the "shitty shared hosting servers" explanation. As far as I can tell phpBB is still doing a full page refresh, it's just that it's much faster than Reddit (or any major SocMed). This shouldn't come as a surprise, because it's Reddit, not phpBB that's awful. I once tried to start up a local instance of Reddit (it used to be open source, and I think the old code is still floating around somewhere), and my computer just gave up. By contrast the rdrama code The Motte uses, runs locally with no issues, and phpBB could probably run on a coal fueled kitchen stove. I'm also quite sure that all the Big Tech frontends are deliberately enshittified, because Nitter was running way faster than Twitter, Teddit was way faster than Reddit (before they started closing outside access), Piped is way faster than Youtube, etc. I think I've never seen an alternative frontend that didn't BTFO an official Big Tech one in terms of performance.

Convenience. You can use hundred of forums on one site without having to go through the tedious process of signing up, remembering login info, and clicking on verification emails for each separate forum. And you get a consolidated feed of all the forums, and notifications of replies to your posts.

It worked great in the earlier days before the moderators went absolutely mad.

When I left reddit I removed all my posts and comments.

Your data still likely made it in the sale.