Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
AI Education Programming

Can AI Be Trained to Grade CS Homework Assignments? (medium.com) 58

Long-time Slashdot reader theodp writes: Tech-backed Code.org reports that as part of efforts to provide scaled human-centered education, the Stanford AI Lab analyzed 711,274 solutions to interactive block-based Code.org programming assignments submitted by 3rd and 4th grade students to develop AI-based solutions for automatically grading student homework. The research project received funding from LinkedIn founder and VC Reid Hoffman, who is coincidentally a $1+ million supporter of Code.org, which provided the student data.

Autograding systems are increasingly being deployed at all levels of education to meet the challenge of teaching programming at scale. So, will AI make Computer Science grader and undergraduate teaching assistant jobs obsolete?

This discussion has been archived. No new comments can be posted.

Can AI Be Trained to Grade CS Homework Assignments?

Comments Filter:
  • by Gim Tom ( 716904 ) on Sunday April 10, 2022 @09:42AM (#62433986)
    I am now a 75 year old engineer, long retired. I can't remember exactly when, but I was learning my first programming language in the late 1960's. It was Algol 60, and of course back then, it was all punch cards and batch processing. MY CLASS assignments were machine graded BACK THEN! Throwing the letters A and I at the same process does what? Why is it new?
    • An AI freind wants to know

    • by hey! ( 33014 ) on Sunday April 10, 2022 @09:51AM (#62434000) Homepage Journal

      Well your class assignments were probably multiple choice. Let's say I asked you to write a sort routine and then were going to grade you on the quality of your work. That's something you can train an AI to do.

      Whether you can train it to do a *good* job is a different question, and whether it would be worthwhile is yet a different question again. Most AI amounts to cheap, massively scalable mediocrity.

      • by gweihir ( 88907 )

        Most AI amounts to cheap, massively scalable mediocrity.

        Indeed. Or worse. Unfortunately, there are a lot of mediocre (or worse) people out there that do not understand that.

        • Whenever someone new in my team talks about mediocre programmers and about how there are little real good programmers I know trouble is coming. Most of the times these are the overcompensating types. Lot of hot air, but little result because you know, they have to work with all those mediocre people that just don't understand things the way they do.
          Sometimes it is a brilliant guy, but he requires a lot of "maintenance" since he has trouble understanding humans and doing business.
          In my experience, the one
      • Let's say I asked you to write a sort routine

        A big component of the grade should be whether it properly sorts the input. You don't need AI for that.

        • by fahrbot-bot ( 874524 ) on Sunday April 10, 2022 @01:49PM (#62434384)

          Let's say I asked you to write a sort routine

          A big component of the grade should be whether it properly sorts the input. You don't need AI for that.

          Indeed. (way back in the mid 80s) We had an assignment in the Programming 101 (C / Unix) class to read numbers in one layout, sort them, and write them in a different layout. I wrote the input/output routines in C and this shell script:

          readin < ifile | sort | writeout > ofile

          Got full credit -- with a note about thinking outside the box.

          That professor hired me after I graduated to be a systems programmer / systems admin with Unisys at NASA Langley for their new Cray-2, Convex and other Unix systems.

  • At the basic level (Score:5, Insightful)

    by bradley13 ( 1118935 ) on Sunday April 10, 2022 @09:48AM (#62433996) Homepage

    Plain old rule-based Lint plus automated tests, sure.

    Don't use neural nets, because they are awful at explaining why they do stuff. Not useful for teaching.

    • More to the point, code is written for two reasons: First, it's written to perform a specific task. Second, it's written to document that process to other humans.

      It's beyond trivial to ensure the first requirement is satisfied. You plug in the code and test it. Does it produce the correct answer or not?

      But the second job of the code is harder to quantify. Does it read clearly, explain the process logically? Are variables and functions named well? Are there clear comments describing the code? Are thin

      • by kmoser ( 1469707 )

        It's beyond trivial to ensure the first requirement is satisfied. You plug in the code and test it. Does it produce the correct answer or not?

        It's not so trivial to ensure it produces the correct answer given all possible valid inputs, not to mention gracefully handles all invalid inputs.

  • by abramovs ( 744048 ) on Sunday April 10, 2022 @09:53AM (#62434002) Homepage

    As an education researcher (more specifically, a Learning Scientist) who does research on assessment, I already have high confidence in what will be the potential value/impact. [drumroll, please]:

    For some homework assignments, that have limited ability to help students, this could work. But for many students, this won't be valuable.

    To be more specific, this could be a valuable effort as long as the feedback from the homework is what students need. Will the automated grading tell the learner why they got the answer wrong? Or will it just point out that they made a mistake? In relation to your own learning, think about how often you learn when someone tells you that got something wrong. Did that help? Or even further, think of the times you got something wrong and then someone showed you how to do it the 'right' way. Did that help? I bet the answer is that it did help sometimes, and then other times it wasn't really valuable since you needed to develop a better understanding of what you weren't understanding.

    The funny thing is that people (even educators) often forget the value of assessments, including homework. They only think of assessments as summarize, letting the learner (i.e., student) and instructor (i.e., teacher) whether someone knows something or not. But, at most, that's about 50% of the value of assessment. The other factor is formative, or whether the assessment (including homework) help the learner understand what, if anything, is preventing them from understanding (mastering the skill, using the knowledge, etc.)

    • I absolutely concur. And specifically when it comes to programming, code has high variability, good programmers will create hooks for building onto it later, and often there are multiple ways to do things. It is not uncommon for me to put hooks and bits of stuff in my code intended for use later. Or subsections for debugging or testing. Maybe a tracker to figure out how long certain sections are taking to execute so I can determine if optimization is needed.

      If a professor looks at my code with "useless" var

      • Devil's advocate here, but isn't that why an AI is needed? You train it to do the routine stuff such that the teacher can spend more time appreciating the code? Correcting takes a lot of precious time.
        • That's a fair question (want to be an education researcher?).
          The key challenge of determining the 'correctness' of an assessment is to figure if the learner is in one of four states:
          A. They got the problem right and they understand what the problem is measuring.
          B. They got the problem wrong and they don't understand what the problem is measuring.
          C. They got the problem wrong and they understand what the problem is measuring.
          D. They got the problem right and they don't understand what the problem is measurin

      • I know that Agile isn't the Holy Grail, the Silver Bullet nor the Ark of the Covenant. That said, one of the Agile aphorisms is "if you think you are going to need it, don't include it"

        You are also a conscientious developer who puts a lot of important shit in you code comments. That said, if some clever implementation needs 'splainin', maybe it needs to be coded in a way that excuses do not need to be offered as to what it is being done that way?

        OK, OK, profiling. Maybe there are profiling tools, ju

        • one of the Agile aphorisms is "if you think you are going to need it, don't include it

          There's something to be said for keeping things lean, but there's also a need to anticipate future functionality. I've seen far too much "Agile" code that had to be trashed entirely because lack of foresight ended up with code that "painted itself into a corner." I've too many programs that had to be essentially discarded entirely because the general approach at solving a problem was too limited, inflexible, etc. and it ends up being better to just start fresh.

    • by gweihir ( 88907 )

      Well, yes. Some actual teaching experience and a working mind is enough to see that. Sadly, these decisions are usually made by people that lack at least one of these and often both.

    • One of my formative coding experiences was a summer internship doing COBOL programming.

      A second formative experience was a course, I guess it was called Systematic Programming using Pascal.

      The professor lectured on how you could tell which language someone learned before Pascal by how they program in Pascal. Sure shootin', I got an assignment marked down for relying on global variables, which of course, was a "tell" for my prior COBOL experience. The TA doing the grading gave me a scolding in the gra

    • Has anyone studied the benefits (or issues) with having students learn first, then teach as final proof of knowledge? I think becoming the teacher, even only part time, would change how students interact with and think about their teachers. Also people can only focus on so many things (1 teacher for 40 or 200 students?)...so having lots of people give a little feedback about fewer people should increase the quality of those comments.

      Or creating material they'd wish they'd had to learn a concept, giving th

  • by Flownez ( 589611 ) on Sunday April 10, 2022 @09:55AM (#62434004)
    But innovative solutions do exist, and AI is way better at detecting patterns of what has been, or optimal solutions for solving for requirements.

    Correct recognition of a student's innovative, but wrong solution is imperative if we are to harness emerging talents, and not miss potential that needs a human mentor's guidance to be realised.
  • by Anonymous Coward

    "AI" is just about the most uninteresting thing you could throw at grading assignments.

    It's no coincidence that teaching assistants are typically students themselves. Teaching others teaches you, too.

    Now first you're clamouring for "more coders" and when an opportunity arises to further their education, you jump on the idea that you can write an AI to do it all instead?

    You need your head examined. Or just admit that the "we need moar cod4rz" is a ruse, build an AI to do the job, and fire everyone your AI

  • I wouldn't put it past code.org to successfully manage this. Maybe one day there'll be AI-written articles on this site.
  • The first problem with this is that there will be not enough meaningful feedback. But the worse problem is that as soon as you change the assignments, you will have to build up a pool of manually graded assignments. The manual grading will either be worse, because those doing it eventually lack experience or they will not even be done at all.

    And so, on the eternal quest of making things cheaper, quality will suffer and that is not good at all.

  • by rantrantrant ( 4753443 ) on Sunday April 10, 2022 @10:47AM (#62434086)
    ...students got it wrong. Telling *that* it's right or wrong or how right or wrong is the easy part but that doesn't help students much. It typically devolves into a game of guess the teacher's/AI's password, i.e. keep trying/guessing until you get it right regardless of whether you understand the problem or the code at all. A good teacher can predict what students are likely to find challenging & know that there may be a variety of reasons why a students might get it wrong/misunderstand it & then provide the appropriate feedback -> feed forward according to the particular student's learning needs. AFAIK, even in the simplest terms, nobody has been able to get machines to do this anywhere near appropriately/productively. But who knows, maybe they can find a way to get "canned feedback", as it's known in the technology enhanced learning (TEL) industry, to work this time around?
    • A good teacher can also later show the class various examples of how the problem was solved. This would be valuable feedback beyond a simple fairly useless grade.

      • Yes, grades are useless. Certainly SAT and GRE scores have evidence in being useless in predicting success.

        So then as a selection tool for entry into elite schools and being hired into prestige (i.e. high paying) jobs, we won't use them. We won't evaluate anyone's performance because the Pointy Haired boss doesn't know anything.

        I guess we will do away with the whole concept of Meritocracy -- everyone hates it anyway. I guess we will fall back on nepotism and networks of personal connections? This i

      • It's actually more effective to do it the other way around, i.e. show the class various examples of how to solve a problem before they attempt it themselves. In CogSci it's called the worked (out) example effect: https://en.wikipedia.org/wiki/... [wikipedia.org] As I once heard someone say, "Nobody has ever learned anything faster by withholding useful information from them."
    • The AI wouldn't ask why a student got it wrong. It would tell the student what it got wrong. AI is also very good at pattern recognition, and mistakes usually fall into a particular category. So AI should be very good at determining what is wrong, and why. Therefore, anticipating problems before they happen would be a strong suit of AI. In my experience, humans suck at that, at least in actual practice.
      • I commented above that the TA for the class grading the assignment does not ask the student this question either.

      • The trouble is that in practice, the same wrong answer can often have multiple causes. At least in a human brain. The tricky bit is working out which misunderstanding a particular student made. This usually requires some "diagnostic" questioning by the teacher. Then again, if all you want is right or wrong, AKA summative assessment, fine, go ahead with AI. However, that only works with complicated problems. As soon as you get to complex problems, AI grading becomes essentially useless. Both Pearson Educatio
    • And what can an AI do if a student thinks outside the box and comes up with a working solution that isn't in the AI's list? As an example, let's say that a class is told to write a simple bubble sort and one of them figures out how to do it recursively? (Yes, you can do it and the code looks remarkably elegant if you don't mind the extra time it takes.) A human teacher should be able to work out the logic, but what can an AI do?
      • As I understand it, AI's primary function is to identify regularities & patterns in data, so novel answers should be easily includable in the AI's database. The inability to predict acceptable answers would be a human error.
        • There's no reason that novel answers can't be added to the database once they're recognized. The problem I'm pointing out is how do you get them recognized?
          • Humans have to train the AI.
            • Yes, of course. My question is how does the AI recognize that this is something it can't handle? Is it something as simple as getting the right output from a method that it isn't in its list? If so, what do you do if there's more than one right output, such as with a chatbot? My point is that there are limits to what you can do with AI and grading homework assignments probably pushes the boundaries when creative thought is required.
              • AFAIK, for the foreseeable future, complex grading remains a uniquely (skilled) human endeavour. One of the similar principles to the scientific method, says that the test/experiment must be a true reflection of what would happen in the real world, AKA "construct validity." In other words, we use tests to predict how well a candidate would do at a particular task/problem in further/higher education &/or at work. Since most tasks/problems that we pay people to solve in the real world are complex, any ass
  • Yea, it’s really not hard. When I was in college, grading papers was done using the stair method. As in, you threw them down a flight of stairs and the further it went the higher the grade. Now I know this is CS, but I’m pretty sure we can get a printer, AI, and a robotic arm on this no problem.
    • The stairs method is no longer usable at the U.

      Students compare each other's returned papers. If their study partner got full points and they got marked down for the work they copied, they will be in your office fast enough to make your head spin. Don't think of accusing a student that they copied when they caught you in not giving the same points deduction. "Oh, get your friend to come in here that I can take the same deduction!" Yeah, right.

      I counsel TA's and graders, whatever credit or partial c

  • by QuietLagoon ( 813062 ) on Sunday April 10, 2022 @11:03AM (#62434108)
    The question that should be asked is: how well can AI grade CS homework assignments?
  • Write a software code implementing an AI system capable of grading homework CS assignements.

    Now rate this!
  • by yo303 ( 558777 ) on Sunday April 10, 2022 @02:43PM (#62434490)

    "The research project received funding from LinkedIn founder and VC Reid Hoffman, who is coincidentally a $1+ million supporter of Code.org, which provided the student data".

    That's... not what "coincidentally" means. The two facts are quite related.

  • Write a program that halts.
  • Student delivers code, code must pass tests, which may or may not include time or memory limits.

    It's not complicated.
  • When I was in college, I was given the assignment of writing a LISP interpreter in SNOBOL. I decided to go them one better and turned it in with an English grammar parser written in LISP to run on it. [How to make a TA cringe...]

Ocean: A body of water occupying about two-thirds of a world made for man -- who has no gills. -- Ambrose Bierce

Working...