Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Programming

Stack Overflow is Feeding Programmers' Answers To AI, Whether They Like It or Not 90

Stack Overflow's new deal giving OpenAI access to its API as a source of data has users who've posted their questions and answers about coding problems in conversations with other humans rankled. From a report: Users say that when they attempt to alter their posts in protest, the site is retaliating by reversing the alterations and suspending the users who carried them out.

A programmer named Ben posted a screenshot yesterday of the change history for a post seeking programming advice, which they'd updated to say that they had removed the question to protest the OpenAI deal. "The move steals the labour of everyone who contributed to Stack Overflow with no way to opt-out," read the updated post. The text was reverted less than an hour later. A moderator message Ben also included says that Stack Overflow posts become "part of the collective efforts" of other contributors once made and that they should only be removed "under extraordinary circumstances." The moderation team then said it was suspending his account for a week while it reached out "to avoid any further misunderstandings."
This discussion has been archived. No new comments can be posted.

Stack Overflow is Feeding Programmers' Answers To AI, Whether They Like It or Not

Comments Filter:
  • by Mr. Dollar Ton ( 5495648 ) on Wednesday May 08, 2024 @12:22PM (#64457259)

    It said: your control over the content: none. What can we do with it: anything we like.

    It has never been different with shit that is posted on the internets.

    • My Stack Overflow usage has dropped to finding something on there two or three times a year, down from daily a few years ago.

      Hazard a guess, but there might be a large drop off in Stack Overflow questions on more recent technology when you exclude the "how to do hello world in Python with Visual Studio Code" questions.

      Or it might be that since we stopped picking up just about any new technology, open source library, pattern, build tool, javascript framework of the week, we have less need for Stack Overflow.

  • Unless Stack Overflow is getting paid by OpenAI (which I doubt) or they are able to somehow write this off (which I wouldn't put anything past a motivated accountant) then how exactly is Stack Overflow profiting from this? While nowhere near as bad as some sites, they still seem to be Ad driven, so again, where is that revenue if access is purely by API?
    • by ISayWeOnlyToBePolite ( 721679 ) on Wednesday May 08, 2024 @12:40PM (#64457337)

      From the previous slashdot story linked in the summary:

      OpenAI will have access to Stack Overflow's API and will receive feedback from the developer community to improve the performance of AI models. OpenAI, in turn, will give Stack Overflow attribution -- aka link to its contents -- in ChatGPT. Users of the chatbot will see more information from Stack Overflow's knowledge archive if they ask ChatGPT coding or technical questions.

      • Except that Stack Overflows knowledge archive is full of incorrect, misleading, or poor quality answers. Just what you expect from AI!

        • Except that Stack Overflows knowledge archive is full of incorrect, misleading, or poor quality answers. Just what you expect from AI!

          Users who loved this comment also loved the fact this same AI will be used to generate more and more of the content that then gets swept up into AI models.

        • by vbdasc ( 146051 )

          Yet, the Stack Overflow answers (and even questions) are rated. This improves their usefulness to an AI quite a bit, IMHO.

          • And yes, they are rated but in a social media way, not by someone studiously deciding that the answers are correct. I have seen obviously incorrect answers being rated highly, and the later correct answer not rated at all.

    • they still seem to be Ad driven, so again, where is that revenue if access is purely by API?

      From the (blurb to the) article: 'Stack Overflow's new deal giving OpenAI access to its API as a source of data'. Stack Overflow doesn't give away the info they have for free. Access may be purely by API, but that says nothing about how SO charges for API access when they deal with a giant company like OpenAI.

  • by SuperKendall ( 25149 ) on Wednesday May 08, 2024 @12:31PM (#64457297)

    The whole point of answering on StackOverflow is to help people. For free.

    Spreading the knowledge you posted out to an even wider group of people is exactly what everyone should want to happen with information they posted.

    I imagine they could make the whole issue go away by giving anyone who complains 10 reputation points. Or ignite a bonfire by giving anyone who complains a "Whiner" badge. :-)

    • The whole point of answering on StackOverflow is to help people. For free.

      In this case I agree. At least last time I checked (I just checked again archive.org has the whole thing [archive.org]) Stack Exchanges questions and answers are available for all under cc-by-sa 4.0. I put up answers on there where I want other people to do things right, even if they are doing it in someone else's corporate environment. Good security answers help us all if people follow them. If this helps people find my answers then good.

      Just don't put code there. Put it on GitLab and link to it there.

      • by flink ( 18449 )

        cc-by-sa means the author of the post should get credit, no? Open AI should attribute the author of the post directly in their result, not just link back so StackOverflow.

        • It doesn't mean that SA only has cc-by-sa rights. Just that those are the default rights if you don't sign a deal with them. You fully sign over whatever you type in there.

          The terms are tricky because they start by saying it's all Creative Commons until they get to that crucial "and":

          You agree that any and all content, including without limitation any and all text, graphics, logos, tools, photographs, images, illustrations, software or source code, audio and video, animations, and product feedback (collec

      • "Just don't put code there. Put it on GitLab and link to it there." Tried that. A mod went in and copied my code and pasted it into SO. When I edited my post back how it was I got a 7 day ban. This escalated into me being permanently banned from SO and they kept my code under an "anonymous" user.
        • by kmoser ( 1469707 )
          They altered the deal. Pray they don't alter it further.
        • That's actually the point and what you really want. You should not have linked the two different accounts (Gitlab and Stack overflow) and let them know they were related since that allows them to do retribution against you if you act outside. On Stack Overflow you might edit the code or text to say where it's copied from and you might notify them of your copyright claim.

          No matter what happens you now have a legitimate and verifiable copyright claim against stack overflow for that code and, because you didn'

    • You clearly don't know who Ben (of BenUI.ca) is--which is fair, why should you? But his whole deal has been making resources and giving away answers for free. That's not actually his problem here.

    • by AmiMoJo ( 196126 )

      Stack Overflow is an MMORPG, and people get upset when something threatens their hard grinded stats.

      Not too worry though. Most answers on SO are so poor that this AI has no chance. Often the highest rated ones are the worst.

      • Stack Overflow is an MMORPG, and people get upset when something threatens their hard grinded stats.

        Not too worry though. Most answers on SO are so poor that this AI has no chance. Often the highest rated ones are the worst.

        This. I've found SO to not be very useful, so Why AI would want to train itself on mostly wrong answers to questions is beyond me.

      • Not too worry though. Most answers on SO are so poor that this AI has no chance.

        Although this is true, it's still the best of all possible options as far as I can tell. If I'm searching for a coding question I will turn to any StackOverflow post over any Medium article...

        Although I may have to hunt through answers a bit to find the truly best solution over the accepted one.

        What you say is why I still don't use AI for coding much if at all, I never find very useful what it tries to give me.

    • The whole point of answering on StackOverflow is to help people. For free.

      This. Although at least for questions I have had, I'm not certain the answers should be part of AI.

    • by allo ( 1728082 )

      > Spreading the knowledge you posted out to an even wider group of people is exactly what everyone should want to happen with information they posted.
      So training open source models yay, selling to "OpenAI" nay.

    • by vbdasc ( 146051 )

      The whole point of answering on StackOverflow is to help people. For free.

      Yep, this exactly is the point. Helping people for free. Not helping megacorporations for free which will later help people for a fee and eventually eliminate your job, using the free help you gave them.

      Spreading the knowledge you posted out to an even wider group of people is exactly what everyone should want to happen with information they posted.

      Yes, but most "helpful" people become reluctant to help when some middle man starts profiting from their efforts, to distribute the help to "an even wider group of people". For a fee, of course, as said above.

      It's exactly the same with free software. The GPL was created, among other things, to ensure that no

      • Yep, this exactly is the point. Helping people for free. Not helping megacorporations for free which will later help people for a fee and eventually eliminate your job, using the free help you gave them.

        If someone working for a megacorp finds your help on stackoverflow and uses it to solve a work related problem, then you've essentially helped the megacorp for free anyway.

      • Yep, this exactly is the point. Helping people for free. Not helping megacorporations

        You misunderstand. When I say I do it yto help people for free, I mean the ONLY point is to help some coder for free.

        If someone makes profit in the middle I DO NOT CARE. They can be 70 billion dollars on my amazing thoughts on use of semicolons and I DO NOT CARE. As long as it helps someone else down the line.

        Far too much of the world is consumed with worry over who is making a profit on what. Screw that. I only care t

    • Spreading the knowledge you posted out to an even wider group of people is exactly what everyone should want to happen with information they posted.

      You apparently do not understand the source of the outrage. Some people like giving things away for free. A group of those people appear to feel taken advantage of when what they gave away for free is now packaged into a profit making venture that they see no revenue out of.

      Surely you can understand that?

  • by Frobnicator ( 565869 ) on Wednesday May 08, 2024 @12:35PM (#64457311) Journal

    Stack Overflow uses the CC-BY-SA license. [creativecommons.org]. While they do allow commercial adaptation and transformation like use in AI, they also explicitly require attribution to the licensor (meaning they must cite the specific contribution they used) and they MUST distribute derived works under ShareAlike terms under the same license as the original. They took a bit of flack years ago moving from CC-BY-SA 3.0 to 4.0, and they pinkie-promised that they wouldn't make changes to the license again.

    Does this mean the AI system they're training will have everything shared under CC-BY-SA? Will every Stack Overflow post be specifically referenced?

    While the TOS do grant an irrevocable perpetual license under the CC-BY-SA license, they don't grant rights to modify the existing licensed materials to something else. The CC-BY-SA requires both attribution to the original poster use under the same CC-BY-SA license.

    • by MobyDisk ( 75490 ) on Wednesday May 08, 2024 @01:02PM (#64457405) Homepage

      This is a great big hole in AIs today. I'm totally fine with someone sucking-in all the knowledge from the entire universe to create Commander Data. That sounds frieking awesome! But since the AI's neural network degrades in a similar way to our brains, it can't know the sources. And really, if you synthesize data from multiple sources, it is kinda hard to source it anyway. But this is definitely an issue!

      What if the AIs we have today are actually the best AIs we will ever make, because they were done without regard to copyright laws. Hereafter, it might be that there is no financially feasible way to get enough data to train the next generations.

    • Learning something does not make a work derived. AI doesn't just take and modify slightly it meshes together many sources to come up with an answer to the question. The models are getting so extensive that it's getting impossible to attribute the source for an answer. It's not a search engine.

      • by isomer1 ( 749303 )
        AI isn't "learning" anything - it just has obfuscated storage and retrieval. "Meshing" together 10, 100, or 1000 copyrighted/licensed works is still infringement - on every single one of the works - even when the works only form a portion of the language model.
        • by segin ( 883667 )
          This isn't any different from how the human brain works.
          • by flink ( 18449 )

            Considering we don't know how human brains work, that is a bold statement.

    • by omnichad ( 1198475 ) on Wednesday May 08, 2024 @04:00PM (#64457805) Homepage

      Read closer. When I visit their TOS, I see a very important AND:

      You agree that any and all content, including without limitation any and all text, graphics, logos, tools, photographs, images, illustrations, software or source code, audio and video, animations, and product feedback (collectively, “Content”) that you provide to the public Network (collectively, “Subscriber Content”), is perpetually and irrevocably licensed to Stack Overflow on a worldwide, royalty-free, non-exclusive basis pursuant to Creative Commons licensing terms (CC BY-SA 4.0), and you grant Stack Overflow the perpetual and irrevocable right and license to access, use, process, copy, distribute, export, display and to commercially exploit such Subscriber Content...

      Part 1 is a blanket CC BY-SA license to all site visitors. Part 2 - Stack Overflow owns it all, even outside of Creative Commons.

    • You agree that any and all content, including without limitation any and all text, graphics, logos, tools, photographs, images, illustrations, software or source code, audio and video, animations, and product feedback (collectively, “Content”) that you provide to the public Network (collectively, “Subscriber Content”), is perpetually and irrevocably licensed to Stack Overflow on a worldwide, royalty-free, non-exclusive basis pursuant to Creative Commons licensing terms (CC BY-SA 4.0), and you grant Stack Overflow the perpetual and irrevocable right and license to access, use, process, copy, distribute, export, display and to commercially exploit such Subscriber Content.

    • by Xylantiel ( 177496 ) on Wednesday May 08, 2024 @04:34PM (#64457893)

      sarcasm: But AI is a copyright washing machine!

      You "train" it on the copyrighted work (we actually know that this is really just optimizing the software to produce snippets from the work based on prompts structured in a particular way), then it will just produce any portion of the work requested but the people who run the AI can charge for access to it. And by pretending they don't know how the "AI" works, they say it is a magic box that can't possibly be subject to copyright. So you can just pay one fee to the AI company and have access to all works that the AI "learned". Copyrights all cleaned away!

      This makes no sense, of course. Even if a human memorizes a copyrighted work and can recite any random portion of it on demand, the work is still a copyrighted work and the portions cannot be passed off as the product of the human who memorized them. Another important distinction is that the human is not a "work" in themselves, whereas an LLM optimized using a particular set of works is itself a work and therefore subject to legal restrictions. So one can't sell access to the LLM (without paying the copyright holders), since it is returning copyrighted text, and one can't sell the LLM itself (again without paying the copyright holders), because it contains the works and is therefore a derivative work. One could pay the copyright holders, what a novel idea.

      I think the next level of argument is that the LLM is just like an index, which is not a derivative work, but the conditions under which that applies are pretty restricted and the onus would be on the entity running the LLM to demonstrate that those conditions apply for it to be exempt from being considered a collection instead of an index. I think the simple truth is, as you say, works under CC-BY-SA are simply not compatible with an LLM that doesn't have attribution built in.

    • by vbdasc ( 146051 )

      Does this mean the AI system they're training will have everything shared under CC-BY-SA? Will every Stack Overflow post be specifically referenced?

      In their quest for more training data AIs are plundering and digesting copyrighted works at an unprecedented scale. Trillions of dollars are at stake in the game they're playing. Do you think some measly CC-BY-SA license will stop them?

  • by rknop ( 240417 ) on Wednesday May 08, 2024 @12:35PM (#64457313) Homepage

    If AI coding is trained on stack overflow, that's a reason to be suspicious of any code that AI produces....

    • by Seven Spirals ( 4924941 ) on Wednesday May 08, 2024 @01:09PM (#64457425)
      Yeah, but you can bet the AI will know how to answer any coding question. It's easy, it goes like this:

      CODER: Codebot, how do I do XYZ?

      Codebot: Why would you want to do that? You know you shouldn't do that right? That's insecure and old. Don't do that. Switch to something completely to the question you asked or the problem you have because I have a personal preference against the thing you asked about and I like the thing I mentioned better.

      :-)
      • I used to get answers about C++ when asking C questions, which were then upvoted despite being obviously wrong.
        In the future, I can easily imagine all answers from AI to be "Please just use Rust instead. This answer brought to you by Google and Carl's Junior.'

        • Hahaha, I forgot about that. I used to get "Please just use Haskell, you aren't enlightened enough to code." for C-question answers, too. I guess those fell out of fashion.
    • The answers the AI gives will be really grumpy and supercilious too.

  • by Dwedit ( 232252 ) on Wednesday May 08, 2024 @12:35PM (#64457317) Homepage

    If it's publicly visible on the internet, it's already being picked up by search engine crawlers and AI bots.

    • by Viol8 ( 599362 )

      Exactly. You have to wonder what kind of mouth breathers thought it wasn't the case that the AIs were slurping it. Why would Stack Overflow be a special case amongst web sites?

      • by MobyDisk ( 75490 ) on Wednesday May 08, 2024 @01:04PM (#64457411) Homepage

        Everybody knew this. The problem isn't the slurping of data, but the slurping of data *without attribution* and *without regards to the copyright*. When I do a Google search, I get the citation. But not when I use the AI. Of course, if we make the AI companies pay proper royalties, cite attribution, and pay licenses - the companies will just move to China.

  • Wah ... (Score:2, Insightful)

    People reaching out to help others for free are upset because their freely given advice is being used to provide even more freely given advice that might one day be charged for.

    More faux/hypocritical indignation from the perpetually offended in my opinion.

    If you don't like it, create your own web site and run it the way you want it. When you are using stuff created by other people for free, they have all the control over how the site is used.
    • If you don't like it, create your own web site

      With Blackjack! And Hookers!

      In fact, forget the web site!

    • ....to provide even more freely given advice that might one day be charged for.

      If it is being charged for then it is not freely given in any sense of the word 'free' is it? I suspect that is their main point of contention. If the output of the AIs being trained were guaranteed to be all made freely available then I doubt anyone would be complaining.

      When I provide free content I have no issue with it being disseminated by others provided that it is always done so freely. I do have issue with others trying to make money from it by charging others for access which is why I always us

    • Subscriber content is CC-BY-SA and licensed to StackOverflow. It is specifically not revokable. So you can't actually delete your post and expect SO to honor that. They say multiplw times that the license is perpetual and nonrevokable, I don't know what happens if SO violates the terms of your license. I assume nothing.

      • by flink ( 18449 )

        If they violate the terms of the license, they have no right to distribute the content and you can sue them for infringement.

        • I'm not sure about that. The terms of service and agreement has repeated phrases and weird licensing terms on top of CC-SA-By. I'm concerned that company attorneys could weasel out of the usual obligations in a typical copyright licencing agreement by arguing that the terms of service is the real agreement.

          For example:

          You agree that any and all content, including without limitation any and all text, graphics, logos, tools, photographs, images, illustrations, software or source code, audio and video, animations, and product feedback (collectively, “Content”) that you provide to the public Network (collectively, “Subscriber Content”), is perpetually and irrevocably licensed to Stack Overflow on a worldwide, royalty-free, non-exclusive basis pursuant to Creative Commons licensing terms

          Now when You or I post on StackOverflow, I think our intent is that we expect this to work like any other ShareAlike /w attribution by license. And that we'll receive attribution, as clearly s

          • by flink ( 18449 )

            Maybe we could just send a DMCA claim to their ISP? They'll at least have to deal with it and it will cost them. You're only out a stamp.

          • You forgot this bit

            and you grant Stack Overflow the perpetual and irrevocable right and license to access, use, process, copy, distribute, export, display and to commercially exploit such Subscriber Content

            Which kind of reads like "you also let SO do what ever they want, how ever they want, forever"

    • No, chatgtp charges money for many things; you are ASSUMING that everything is being used to make models and information that will be freely available to all. That is not the case.

  • while (humans.exist()) { humans.selectRandom(10000).kill(); }

    • Is 10000 supposed to be the preset kill limit per killbot?

      while (humans.exist()) {
            killbot = new killbot(randomname, 10000);
            killbot.kill(humans.selectRandom(killbot.kill_limit));
            killbot = null;
      }

  • Slashdot's overlords are sad for being left in the cold. No one asked them if they want to contribute Slashdot's comment history to large language models.
  • Get off it. It's publicly available. If you try to restrict access .. evil entities and governments will be the only ones with AI. They'll learn off everything and nothing comparable will exist for the regular people to use. So end result, robot armies and factories .. controlled by people who aren't accountable to anyone. Until the AI takes them over.

  • I don't get a choice if someone I don't like reads it or not.

    If you don't want your stuff used by someone or some thing, don't post. It's as simple as that. You can't claim property over your public posting, especially to a third party service that has their own ToS.

  • ExpertSexChange.com Anyone remember them?
  • We need something like Nightshade but for code and text.

    Make AI models unprofitable by denying them reliable data for free.
    =Smidge=

    • Vigilante vandalism. Sure, you do you.

      Once you actually contribute to an open public forum for free, there really are no realistic take-backs.

      'No, stop! I told one how to use a crayon, now they are all using crayons! -Where's my check?'
      • You're totally cool with companies using open source code in their commercial products without honoring the license then, I assume?

        It's not about getting paid, it's about not getting exploited.
        =Smidge=

        • When the agreement contains wording like "and you grant Stack Overflow the perpetual and irrevocable right and license to access, use, process, copy, distribute, export, display and to commercially exploit such Subscriber Content", you have no rights. The CC licence applies to everyone but SO

  • This is really underselling Ben. He set up a site (benui) that documents a lot of the stuff in the Unreal Engine that Epic didn't do a great job of themselves. It's an invaluable resource to those of us that work on Unreal Engine games.

    He got hired by Epic recently, so I'm hoping he improves the quality of their released documentation.

    So he's not adverse to giving away free advice--it's a big part of how a lot of us know him.

  • So I should expect the answer I see too many times "I finally figured it out" and they don't post the solution.

  • Give the AI the ability to upvote and turn it loose. It'll lose its mind, hallucinate, and break down as the recessive shit becomes dominant.

  • My contention would be if they took information people gave for free to help other people, put it into AI, make answering questions in stack overflow obsolete, as it's faster and easer to get the information from an AI prompt, but then paywall it so you have to pay to access that information.

    That's the critical part for me to decide if it's a big deal or not. Now if they want to charge for extra features and other things, that's one thing, but getting basic information that it ingested for free off labor of

  • we can call it AI if we want but whatever the case every CEO on the planet is trying to kill your job. Maybe you'll retire or die or strike it rich before they do, but don't pretend they're not gunning for you.

    And guess what, they've been doing it for 40 years now [businessinsider.com]. If you didn't notice, well, that's called survivor's bias.
  • > The moderation team then said it was suspending his account for a week while it reached out "to avoid any further misunderstandings." ... what the hell?
    Or is this the "new understanding"?

  • It used to be a place where you could see a question similar to the problem you were trying solve, read a conversation as some people narrowed down the scope of the problem, then proposed a few ways to solve it. Now it all seems to be pedantry and gatekeeping and petty objections that contribute nothing.

    From reading the comments here on Slashdot, it sounds like Stackoverflow is archived at archive.org. I guess I'll just start looking there instead.

  • Given the quality, complexity and intricacies of questions posted on Stack overflow, trying to train any sane tool on SO data is a lost cause. Pick any technical question on SO and you will notice that 90% of answers are half-arsed, low-hanging-fruit, irrelevant suggestions. The truth is that nearly every question on SO is unique, very closely coupled with what a given person is trying to achieve, what they've done/written so far, unique mistakes they've made, solutions they already tried, their individual

You are always doing something marginal when the boss drops by your desk.

Working...