Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Programming IT

Stack Overflow To Charge LLM Developers For Access To Its Coding Content (theregister.com) 32

Stack Overflow has launched an API that will require all AI models trained on its coding question-and-answer content to attribute sources linking back to its posts. And it will cost money to use the site's content. From a report: "All products based on models that consume public Stack Overflow data are required to provide attribution back to the highest relevance posts that influenced the summary given by the model," it confirmed in a statement. The Overflow API is designed to act as a knowledge database to help developers build more accurate and helpful code-generation models. Google announced it was using the service to access relevant information from Stack Overflow via the API and integrate the data with its latest Gemini models, and for its cloud storage console.
This discussion has been archived. No new comments can be posted.

Stack Overflow To Charge LLM Developers For Access To Its Coding Content

Comments Filter:
  • by bradley13 ( 1118935 ) on Friday March 01, 2024 @04:29PM (#64283192) Homepage

    StackOverflow was once a great resource. No longer. It seems like every time I get a link to StackOverflow today, I get an answer from 10 years ago that is no longer correct.

    Maybe they're still good for basic homework questions, like "how do I write an Insertion Sort, but for anything serious? No, not any more.

    Anyway, if the LLM trainers have half a brsin, they have already scraped and saved everything useful from these sites.

    • by ls671 ( 1122017 )

      Kind of what I was thinking too. At least I was thinking; who would want to train any AI with stack overflow content? Fine way to have your AI hallucinate if you asked me.

      • Some times I get answers to old systems that are still in use. Tags should be expanded. Linux is no longer just Linux for example. What version? What flavor? What module? But for the relevant portion, no. Why would I as a training module do this when I can just crawl the front end? If something is meant to be paid for, put it behind a paywall. This is the same bullshit that the the newspapers tried to pull. If you put information up on a public facing website, it is effectively public.
        • If anyone has a problem with that, tell me how you intend to get Iranian LLM developers to pay for it? Or Russian ones? They wont. They will just get for free what you have to pay for.
    • by machineghost ( 622031 ) on Friday March 01, 2024 @04:57PM (#64283292)

      I think context matters here. As a web developer, I still use Stack Overflow on an almost daily basis, and it's just as relevant as ever (if not more so) ...

      ... but if that's not true for you, maybe it has to do with the topics you're searching for?

      • Perhaps it does depend on the topics. If you are searching for answers concerning relatively new web frameworks, then the content you find will be current.

        If you search for topics on established technologies or frameworks, there are years' worth of answers. Answers from 10 or 15 years ago may no longer apply to current releases. The older answers should be kept around - lots of organizations are using older stuff - but maybe they shouldn't pop up as the recommended answers everyone. Of course: how can you

        • In that example you can downvote that answer, and leave a comment explaining why it no longer works ... but admittedly it requires a few people downvoting before SO realizes it should show other answers higher.

    • by godrik ( 1287354 )

      I have a good and a bad news for you. Stack overflow is about as good/bad as it was 10 years ago. I think you got better, so you realize now that the answers you get aren't so good. They already were not particularly good 10 years ago.

    • by tlhIngan ( 30335 )

      Maybe they're still good for basic homework questions, like "how do I write an Insertion Sort, but for anything serious? No, not any more.

      Nah, you can use Wikipedia now for those things. https://en.wikipedia.org/wiki/... [wikipedia.org]

      You can get quite a comprehensive beginning CS education just off Wikipedia these days.

    • by coop247 ( 974899 )
      Yeah I mean too little too late, they've already scraped it and "new" content is dwindling.

      I've also had several bad experiences lately where comments are flagged/deleted by mods telling me to "ask a new question" or "offer a bounty" or some nonsense. All I wanted to know was if the 5 month old poster with zero responses ever figured out the issue. I've been using the internet a long time, that works.
  • by MIPSPro ( 10156657 ) on Friday March 01, 2024 @04:36PM (#64283208)
    "Why are you trying to do that? You shouldn't be doing that."

    "That tool isn't my favorite. Try this other one that isn't anything like what you want. I work for those guys."

    "That's insecure. Don't do that. We use corporate garbageware for that, now."

    They rarely answer your question and instead get all hung up on security or style issues. Mother fuckers, they didn't ask for a engineering and security review, they asked you a specific question. If you cannot answer it please SHUT THE FUCK UP and stay out of the way for someone who can. Don't try to show off your CISSP and critique the question (assholes).
    • by Anonymous Coward
      Ooh someone's salty. Found the entitled idiot code monkey who doesn't understand the XY problem! "I don't need a lecture about security, I demand you just tell me how to put MySQL in the DMZ so I can store passwords in plain text!"
  • What Would Be Nice (Score:4, Interesting)

    by Thelasko ( 1196535 ) on Friday March 01, 2024 @04:44PM (#64283226) Journal
    I'd really like to see a LLM that can code in a language other than Python. CoPilot will often try to pass off Python code as other languages.
  • Consider that everyone and their dog now want to charge for training access. Now you've guaranteed there is no good opensource LLM only good commercial LLMs. You're going to regret that when they reach AGI level and the AGI owners tell you 'you already got your payment'
  • As a fairly highly scored contributor to some specific Stack Exchanges, it seems disingenuous to charge LLM makers for content that Stack Overflow got for free from volunteers. Maybe if they were sharing the profits from that content, this would make sense to me... Otherwise, I think the LLMs ought to get that for free just as Stack Overflow did.
    • by galabar ( 518411 )
      How many times have you found the answer to a question on Stack Overflow?
      • dunno - let me just ask this latest LLM AI.
      • Stack Overflow itself? Not that often. But some of the other topic-specific Stack Exchanges like the Database SE, Gardening SE, Ubuntu SE, DevOps SE, Unix SE, and others I find answers on quite often. Or at least breadcrumbs to the answer. Even an answer related to Java 8 can be useful in that it can point me in the direction of narrowing my research to a specific term and to figure out how that was changed/updated/deprecated in Java 21 (as an example) in my day-to-day line of work. Though Reddit has been
    • I think itâ(TM)s fair for them to charge LLM developers â" something has to pay the bills, and it wonâ(TM)t be StackOverflowâ(TM)s users since they are used to getting free content.

      I prefer this model to getting spammed to death by pop up ads, anyway.

      • You mean the free content that those same users created for free? There's a reason that Experts Exchange is relatively unknown and unused compared to Stack Exchange. EE is the model for "you owe us at least *something* for hosting, organizing the content, and keeping the lights on". Jeff Atwood (Creator of Stack Exchange) has even [wikipedia.org]

        cited Experts-Exchange's poor reputation and paywall as a motivation for creating Stack Overflow

        You can't really use those freebies to build a business, then complain when others use the same freebies to build their business. All you have really made is an argument against

        • by Jeremi ( 14640 )

          You mean the free content that those same users created for free?

          Yes, that content. Those same users got value in exchange for their content, in the form of access to a well-designed site full of content that was/is useful to them.

          You can't really use those freebies to build a business, then complain when others use the same freebies to build their business.

          Who is complaining? I'm not complaining, and StackExchange isn't complaining. The only people complaining are the people who think StackExchange shouldn't be allowed to charge LLM companies for access to StackExchange content.

  • If everyone gets their info from LLMs instead of direct sites like stackoverflow then stack overflow can't pay their server bills and consequently no new training data. AIs can be amazingly useful but they still need to get their content from real people's posts. It's similar to when Google and co. ripped off news snippets causing the news sites to lose valuable advertising revenue.
    • by CAIMLAS ( 41445 )

      Stackoverflow is owned by the same parent company which now owns slashdot. Been that way for ages.

      This is slightly different, because llms don't need human language to learn how to code correctly. They can use actual code references and infer.

  • Yes, that's exactly what I need - llms which consistently tell me how to do things the wrong way.

    Apparently stackoverflow is now run by people who don't know what they're doing or what their site even offers.

  • by balaam's ass ( 678743 ) on Friday March 01, 2024 @05:49PM (#64283458) Journal

    to answer questions on StackOverflow.
    ...Right?

    Wait, you mean they're passing this on to their contributors?
    I'm so shocked.

  • StackOverflow had committed itself to give back cc licensed DB dumps. They stop providing them some time ago, justifying it with LLM training when asked and re-enabled them after they got a bit of a shitstorm for no longer giving back to the community. The site content itself is cc-licensed anyway.
    This means, that if they continue to provide the dumps, people could just train on the dumps. If they do not continue, people can legally (given that attribution is provided) crawl the site itself.
    And finally, one

Every nonzero finite dimensional inner product space has an orthonormal basis. It makes sense, when you don't think about it.

Working...