Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Programming Stats

Flawed Online Tutorials Led To Vulnerabilities In Software (helpnetsecurity.com) 96

An anonymous reader quotes Help Net Security: Researchers from several German universities have checked the PHP codebases of over 64,000 projects on GitHub, and found 117 vulnerabilities that they believe have been introduced through the use of code from popular but insufficiently reviewed tutorials. The researchers identified popular tutorials by inputting search terms such as "mysql tutorial", "php search form", "javascript echo user input", etc. into Google Search. The first five results for each query were then manually reviewed and evaluated for SQLi and XSS vulnerabilities by following the Open Web Application Security Project's Guidelines. This resulted in the discovery of 9 tutorials containing vulnerable code (6 with SQLi, 3 with XSS).
The researchers then checked for the code in GitHub repositories, and concluded that "there is a substantial, if not causal, link between insecure tutorials and web application vulnerabilities." Their paper is titled "Leveraging Flawed Tutorials for Seeding Large-Scale Web Vulnerability Discovery."
This discussion has been archived. No new comments can be posted.

Flawed Online Tutorials Led To Vulnerabilities In Software

Comments Filter:
  • by wonkey_monkey ( 2592601 ) on Sunday April 23, 2017 @12:44PM (#54287999) Homepage

    Researchers from several German universities have checked the PHP codebases of over 64,000 projects

    The researchers identified popular tutorials by inputting search terms such as "mysql tutorial"

    Ah, I see where they went wrong. They should have searched for "real mysql tutorial."

    • We know that 95% of the coding bootcamps in South Asia can't code, so obviously the only way they can get something done is by copying what they find googling. Sad state of the industry.
      • Correction '95% of the coding bootcamps products'
        • You've got more bugs than just that one, maybe you should have googled it?

          For example, "what they find googling." Who is doing the googling here? The thing that they found, according to you. But no, actually the tutorial isn't googling at all! Just a one word bugfix, of course, but still! That's the sad state of the interwebs, to be sure.

    • Ah, I see where they went wrong. They should have searched for "real mysql tutorial."

      That's deprecated, now we use really really for sure this time mysql. It's webscale.

    • Naah, they should have used StackOverflow's programming tutorials [timeshighereducation.com] instead.
  • People learn not by memorizing but by looking at examples.

    Most of the people working in Web-related jobs are not security experts, their job is to get things done as quickly and cheaply as possible. You might think in terms of huge corporations were IT is divided in groups, each working on specific parts of the whole. At the smaller scale though, the same person responsible for the front-end with HTML, CSS and Javascript has to work on the back-end with PHP and MySQL. The second their code does what it's su

    • Comment removed (Score:4, Insightful)

      by account_deleted ( 4530225 ) on Sunday April 23, 2017 @01:19PM (#54288111)
      Comment removed based on user account deletion
      • If a bridge collapses, do you blame the production workers who followed the plans exactly as they were or do you blame the engineer who was too lazy to make the proper calculations and didn't get the tests done for the bedrock foundation, etc?

        You're doing the popular "snowflake attack" thing here, when in fact you're the snowflake thinking that everyone is as good as you are. The thing is, as I said, it's not everyone's job to be a security expert. We should expect security to be part of the tools instead o

        • Whoa! Blueprints for a bridge is a very bad analogy for tutorial code. Seldom do you see certified bridge engineers cutting a pasting features of a bridge design into another another. Concept may be applied but only after due diligence.

          Similarly, unless the tutorial is on writing or securing code/services, it should only be considered an introduction to a topic or concept.

          • Seldom do you see certified bridge engineers cutting a pasting features of a bridge design into another another.

            I'm not so sure about that. I worked a few years in construction with my father. He examined the blueprints for every project carefully to order his list of materials and occasionally finds minor mistakes that he need to correct on the ground. One day he found a three-foot wide mistake between two pages of the blueprint for the same wall. The architect called bullshit because the two pages line up perfectly with each other. So my father had the architect and main contractor walk the layout on the ground for

        • by ilctoh ( 620875 )
          That's the problem - In your analogy, you've hired "production workers" when you actually need engineers.
          If you're hiring people dumb enough to copy and paste code into production, without understanding the ramifications, you deserve what you get.
        • Is the salary for online free tutorial writer in the same ballpark as bridge engineer? Someone should tell these open source not to release anything unless it's perfect, either.
    • People learn not by memorizing but by looking at examples.

      This very true, especially in programming. The result is that bad code gets propagated with the best of intentions.

      I do fault many of the tutorial writers for not mentioning stuff like cleaning up form data and the like. They should learn what they're doing before they try to teach it.

      • by dgatwood ( 11270 )

        Where I used to work, we called this the "Stack Overflow Effect" because so much bad code written by well-meaning people was floating around Stack Overflow that did things in dangerous, security-risky ways, such as telling people to disable TLS chain validation so they could use a self-signed cert for their test environment, then wondering why so many apps shipped with chain validation turned off in the production versions of the app.

        I've actually written security documentation whose primary purpose was to

        • I find the majority of online tutorials are very naive with regard to development best practice and stack-overflow is one of the worst offenders for perpetuation this problem. It is a vicious circle. Simplistic solutions get upvoted while better answers take longer to prepare and comprehended by fewer people.

          It is common to see answers presented than ignore the bigger picture, for example, a poor separation of concerns and high levels of coupling are the norm.

          If you call this out your answer will often be

    • by Anonymous Coward

      People learn not by memorizing but by looking at examples.

      Looking at examples is fine and can be beneficial. However, it's important to understand what the example is doing and why. Without sufficient knowledge of the programming languages used and at least passing familiarity with the context, it's all too easy to misunderstand and either cut and paste verbatim code that's inefficient or inappropriate for production or worse, has security vulnerabilities or other bad practices. This is especially problematic with example code found in tutorials since key security

  • I believe it (Score:5, Insightful)

    by JustAnotherOldGuy ( 4145623 ) on Sunday April 23, 2017 @01:17PM (#54288109) Journal

    I believe it.

    I've come across countless tutorials that cover things like capturing and using form field input, but almost NEVER see a single word in them about sanitizing data, or guarding against bad, malformed, or malicious data.

    It's just, "Here's how ya get the data, now go jam it in the database or print it right to the screen!" Fuck me.

    And in all fairness, as a PHP user, I've seen a *lot* of PHP tutorials that were bad, stupidly dangerous, or just plain wrong. One of the most egregious was a "tutorial" that showed sending the entire SQL statement to the server as a GET parameter. That's right, some guy actually coded his shot so that it sent a live SQL statement in the URL, and then blithely processed the attached variables without so much as a how-de-do.

    Later I saw code that did this exact thing used in various scripts (guestbooks, registration forms, comment forms), probably based on this epically flawed "tutorial".

    • I've seen similarly bad tutorials about templating. The way they teach how to cut your basic HTML and CSS apart in chunks is complete nonsense. They're showing people to always copy a whole empty framework and call cut-out parts all over the place. And then inside those cut-out parts, call out other parts. I've seen this done five levels deep.

      It still works, but trying to way your way around all is extremely tedious. And if you need to make a change to the basic original framework, you're out of luck becaus

      • I've seen similarly bad tutorials about templating. The way they teach how to cut your basic HTML and CSS apart in chunks is complete nonsense. They're showing people to always copy a whole empty framework and call cut-out parts all over the place. And then inside those cut-out parts, call out other parts. I've seen this done five levels deep.

        That sounds like Smarty [smarty.net]. What an abomination.

        Jeezus, I hadn't thought about Smarty in years. Now I need a Xanax.

    • I see stuff like that all the time as a full stack contractor, and some places just don't understand why the code needs to be refactored or completely canned and replaced.

      That's also how bad code "infects" whole projects too, the worst offenders are oddly enough System admins and network engineers trying to rapidly prototype something it seems.

      Here is how it goes down, after cobbling together some code they used google to assemble.....they call it good and put it into production ASAP because the fire is und

    • by hey! ( 33014 )

      I've been saying this for years: the reason that the same stupid security holes keep popping up is that they keep showing up in the tutorials that people use to learn new systems and languages.

      The cognitive burden of learning a new system is rough on most people, so it's tempting to make things easy on them. In fact you might have higher satisfaction from students if you do. It certainly makes them feel like they're learning more for less effort if they can make something happen that looks right. But you

    • Yes. And not just you-get-what-you-pay-for free online tutorials, either. I've seen more than one programming textbook (including for this very case - PHP applications with a MySQL database) that describe in loving detail how to construct ad hoc SQL queries using string concatenation and interpolation, say.

      And on the output side, show interpolating user-controlled, unsanitized data from the database directly into the HTML output stream. Maybe there's a half-assed throwaway attempt at anti-XSS, at best.

      These

      • ... that describe in loving detail how to construct ad hoc SQL queries using string concatenation and interpolation

        There's nothing inherently wrong with that as long as you're sanitizing your inputs properly and/or using parameterized queries.

        • Yes, fine, I wasn't specific enough. The texts in question describe how to construct such queries using tainted data. Happy?

          And, frankly, as an IT security professional I'm not satisfied with sanitization in this case. Sanitization tends to be fragile and more complicated than non-experts think, so it often fails to cover all cases. While it can be useful for defense in depth (particularly with whitelisting; whitelisting valid input patterns and rejecting everything else is far safer than approaches that re

  • by __aaclcg7560 ( 824291 ) on Sunday April 23, 2017 @01:23PM (#54288135)

    The underlying problem is that too many programmers are willing to copy and paste code rather than think through what they need to code.

    Remember the left-pad crisis that broke the Internet because a developer removed his npm packages over a dispute? How hard is to write a left-pad function [haneycodes.net]?

    • Exactly. Never paste. Read the tut, understand the technique, and then apply the part you needed by typing in the correct code that you now know how to write. This way you only write your own fresh security bugs.

      OTOH, Matt's Scripts in the end were the most secure on the internet, because they'd been beaten into submission by a million trolls. Sometimes the crapware is better than the random square re-invented wheel.

    • by 0123456 ( 636235 )

      The underlying problem is that so much software is farmed out to Indian code-monkeys who have no idea how to program, so they just copy-and-paste whatever they find on the web. They'll have a new job in three months anyway, so why would they care that the software is crap, insecure and unsupportable?

      • They'll have a new job in three months anyway, so why would they care that the software is crap, insecure and unsupportable?

        The lead programmer (or project manager) should review the code and then beat them with a pointy stick if code was unacceptable. I'm not a professional programmer but I was a software tester for nearly seven years. The lead programmer for the developers I worked with was responsible for the quality of the code in each build. Sometimes programmers got fired if they can't do their job.

      • by Luthair ( 847766 )
        I don't think its Indians in particular, the reality is there are a lot of people in the field who 'knew computers'. You don't get to design a bridge because you know how a calculator and ruler works.
    • by tsm_sf ( 545316 )

      The underlying problem is that too many programmers are willing to copy and paste code rather than think through what they need to code.

      The underlying problem is that we're asked to reinvent tiny wheels all day long, solving trivial problems that have been solved before by people more interested in the specific problem.

      We shouldn't be asking why people are copying bad code, we should be asking why they need to.

      • We shouldn't be asking why people are copying bad code, we should be asking why they need to.

        Not sure why you're rephrasing my statement.

        • We shouldn't be asking why people are copying bad code, we should be asking why they need to.

          Not sure why you're rephrasing my statement.

          There's a subtle difference.

          Your statement clearly poses re-implementation of code as the main alternative to copy-pasting. (boils down to "You should intelligently re-implement, instead of blindy copy-pasting").

          The above statement simply discourages from copy-pasting (boils down to "Do not copy-paste, why do you even want to ?") but is still open to *any* solution :
          that includes having a standard library (which was another criticism back during the "#LeftPadGate" ) which is also a valid solution : if there

    • The underlying problem is that too many programmers are willing to copy and paste code rather than think through what they need to code.
      Remember the left-pad crisis that broke the Internet because a developer removed his npm packages over a dispute? How hard is to write a left-pad function?

      Sorry but now.

      You should not be copy-pasting a left pad function.
      But, you should not be re-implementing yet another one yourself neither.

      Simple trivial task like this *should go into a standard library*.

      On any machine on which I fire up a C compiler, I know that at least I can rely on a decent compliant standard library for simple task.
      If I want to left-pad a number, I just give the appropriate parameter to printf.
      (Well unless I'm writing kernel code, or unless I'm writing for an tiny embed platform where e

      • Why does Npm needs to be any different ?

        Npm is the Node.js Package Manager. Many JavaScript programmers who don't use Node.js directly in their own projects will use npm for the JavaScript tool chain environment. I use JavaScript sparingly in my own projects so I'm not that familiar with the language. I'm under the impression that JavaScript doesn't have a standard library outside of its core functionality, but it does have a ton of frameworks available.

        • I'm under the impression that JavaScript doesn't have a standard library outside of its core functionality, but it does have a ton of frameworks available.

          Yup, that's exactly my (poorly worded) complain.
          Tons of semi-usefull frameworks everywhere,
          but not a basic library of standard functions.

          Leading to either tons of copy-pasting, or relying on scattered external modules.

    • The underlying problem is that too many programmers are willing to copy and paste code rather than think through what they need to code.

      That's a problem; it is not the problem. Addressing it is necessary but nowhere near sufficient.

      The simple fact is that a very experienced, knowledgeable, and thoughtful developer who's simply not familiar with, say, XSS or CSRF vulnerabilities, could read, understand, and implement ideas from a tutorial that is susceptible to and fails to cover them. No amount of "think[ing] through" is going to solve that.

      It's not reasonable to expect people who haven't put considerable effort into identifying protocol vu

  • by Traverman ( 4909095 ) on Sunday April 23, 2017 @01:27PM (#54288153)
    The important takeaway here is not that flawed tutorials lead to bad code. It's the implication that one could actually poison tutorials intentionally, perhaps in some very subtle way. While it would be quite difficult to inject malware this way (unless the tutorial convinces some idiot to download this "include file that you need for this function"), it probably wouldn't be too difficult to inject, say, buffer overflows or XSS vulnerabilities that could well be invisible to novice programmers. Those vulnerabilities could then be exploited post-deployment, perhaps using a bot scan of Github to identify broken apps that include the code. Rust is better because for something on the order of a 10% overhead vs C, it effectively eliminates buffer overflows (unless something is amiss with Rust itself, in which case we have only one bug to fix, but millions of precompiled vulnerabilities in the field). On balance, Rust seems like a net positive to security. It does nothing much, however, to prevent vulnerabilities having nothing to do with memory exploits. For that matter, one could probably write Rust code to exploit Rowhammer. Or poison a tutorial to do that. It would be completely "safe" multithreaded code... that isn't, thanks to ubiquitous shitty DRAM. There's another, subtler issue: UTF8 hacks. One could post a tutorial and substitute various characters with various similar characters. Maybe, just maybe, one could find a way to get some dufus to copy the code into his source and create an exploit because he confuses one character with another one that looks almost the same (or, even worse, exactly the same due to text rendering shortcomings on his end). On the vigilante end, I suppose the only solution is to first of all identify the poisoned/flawed tutorials, and secondly to search Github or other repositories for key snippets. This is a hard problem to automate due to the zillions of ways that the tutorial code might be imported into a project and tweaked to fit, without destroying the vulnerability it injects. So, to the noobs out there: read tutorials, but, at most, copy code from them by retyping it yourself. DON'T DOWNLOAD INCLUDES OR "REQUIRED BINARIES". DON'T CUT AND PASTE CODE INTO YOUR PROJECT. Cross-verify with multiple sources (which could have been manufactured by the same hacker, so beware similar look-and-feel), and if you still don't really understand what you're doing, then do it some other way. Now, for the public generally, I wish there were a way for us to protect ourselves from this crap. I don't think there is, apart from avoiding software like the plague. It's not like the code you cut and paste from the tutorial is going to create some obvious malware signature in most cases, especially if the tutorial is very abstract in nature. After all, there are endless versions of compilers and compiler settings in use out there.
    • by Anonymous Coward
      Someone should write a tutorial on the proper use of paragraphs.
  • The internet as a whole is a tool for quick mass communication. Did people really think it would be impervious to stupidity and incompetence? Social media, news on social media.. Philosophers have long written about the effects of incorrect statements affecting humanity like ripples from a stone being thrown into a pond. They hadn't imagined a world where the pond is electric with ripples traveling outward at the speed of light.
  • Tutorials do not write bad code. People write bad code.

    A tutorial’s purpose is not to follow good practices in all aspects; such a tutorial would be unreadable. A good tutorial focuses on the one aspect being demonstrated, and, for the sake of exposition, intentionally neglects all other aspects, including, but not limited to, error handling, separation of responsibilities, access control, injection avoidance, naming, cache invalidation, etc.

    The implicit assumption is always that you will apply good p

    • A tutorial's purpose is not to follow good practices in all aspects; such a tutorial would be unreadable.

      Really?

      • Imagine you’re writing a tutorial on, I don’t know, getting a list of repositories in a GitHub account and dumping their names and descriptions to the terminal. At the core, it is a single HTTP request: GET /users/:username/repos. You could probably trim the whole thing down to a shell one-liner.

        But no, you want good practices. Let’s do it.

        Our script will accept a user name and a configuration option for the API endpoint; that means some CLI argument parsing and --help handling. Check that

  • by account_deleted ( 4530225 ) on Sunday April 23, 2017 @02:23PM (#54288367)
    Comment removed based on user account deletion
    • where supposedly anyone can 'code', this is the expected result. Why would you expect anything else from cheap inexperienced labor?

      There is a "coding" school here in Seattle (I think it's some chain) that will teach you to "code" in 7 or 8 weeks, and then apparently get a $70k job.

      I don't believe it, I think they are designed to suck up GI Bill money from veterans.

      But at least they are teaching Python...

  • by wisnoskij ( 1206448 ) on Sunday April 23, 2017 @02:53PM (#54288467) Homepage

    Hardly the tutorials fault. A tutorial will not cover every edge case of your specific example. It will not spend 90% of its length teaching about only partially related topics. I read the same SQL and HTML GET tutorials as everyone else, and a basic understanding of programming I learned in the first month of grade 9 prepared me for sanitizing the input. SQL, isn't special, and GET is no different than CIN, it's not rocket science.

  • So, I know how everyone feels. If something goes bad code wise, it goes bad for all of us, whether we update or not thanks to a thousand apps running the same single API. Open source used to destroy open source only to kill the desktop because they can't invent a new architecture fast enough to sell new computers. And, the new ones now aren't that much different, if not less powerful than the ones five years ago. So, the Google and Window$ come up with as many apps that need Internet to work as they can to
  • by Gravis Zero ( 934156 ) on Sunday April 23, 2017 @03:08PM (#54288523)

    While bad tutorials help make shitty coders, there will always be shitty coders. The question is then becomes, "how do we protect internet servers from shitty code?" The answer to this is with secure interfaces and we've failed at most levels.

    Let's start at the top with web serving daemons. Web serving daemons (e.g. Apache) currently support script languages (e.g. PHP) which are a minefield of insecurity. The fact that they were happy to enable script language interpreters and execute them with the same level of privilege as the web serving daemon itself (by default at least) use without a second thought shows a lack of understanding about the dangers they hold.

    The next level of insecurity is in the script language interpreters which are being invoked by the web serving daemons. Script language interpreters intended for use with web servers have only "recently" added the ability to restrict certain operations. However by default, even the most dangerous operations like the execution of text strings are enabled. The most egregious flaw I've seen is in PHP which allows ability to define the value of variables that are not explicitly requested. At no point was this a good idea.

    Drilling down, we get to database daemons. Database daemons do not promote the use of a function call based interface but rather a text only interface. Frankly, anything goes with a text based interface which leaves it wide open to naughty inputs. A text interface is a wonderful concept for ease of use but it's just terrible for security.

    I know that it's the shitty coders fault for writing shitty code but a defensive approach to design is something we should strive for to increase our level of security.

    • by joboss ( 4453961 )
      Honestly PHP has been pretty good for security if you use the proper manual and actually read it properly. It has a lot of pitfalls but most are detailed in the manual. Unfortunately few people really RTFM or care *how* what they are doing works too much as opposed to that it works. In any language that's a problem.
      • It has a lot of pitfalls but most are detailed in the manual. Unfortunately few people really RTFM or care *how* what they are doing works too much as opposed to that it works. In any language that's a problem.

        It's a significantly larger problem for script languages rather than compiled languages.

  • I don't get it why some many developers WON'T use the real documentation. Heck, many of them WON'T even download from official sources, instead relying on third-party collections with obsolete versions, or, worse (at least potentially) intentionally hacked/poisoned mods.

    WHY do so many use W3Fools? I once had a Google filter set-up to keep them out of search results. But W3Fools gamed Google with dozens or hundreds of of different domains, until the technique became widespread and Google threw in the towel a

    • I don't get it why some many developers WON'T use the real documentation

      Because reading documentation is a skill many of them have not developed yet.

  • I find it amazing that those that are the most efficient, cannot create worth a damn. Maybe a demonstration of bleeding edge software design in some field few even understand, like maybe human neural patterns? Of course, googling would only be a suggestion.
  • No other explanation is needed. Why bother blaming bad tutorials?
  • I've been teaching people about this for ages. I have reviewed perhaps a couple hundred recruitment tests as well. You would be shocked how many can't even indent and you see injections all the time. I sometimes perfect to see manual escaping using the provided functions than prepared queries because prepared hides the problem. I am pretty sure a lot of them use tutorials as they are doing the test and it makes me wonder.

    When I am training juniors one of the first thing I get them to do is to learn to go

Keep up the good work! But please don't ask me to help.

Working...