Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Java Programming

Your Java Code Is Mostly Fluff, New Research Finds 411

itwbennett writes In a new paper (PDF), researchers from the University of California, Davis, Southeast University in China, and University College London theorized that, just as with natural languages, some — and probably, most — written code isn't necessary to convey the point of what it does. The code and data used in the study are available for download from Bitbucket. But here's the bottom line: Only about 5% of written Java code captures the core functionality.
This discussion has been archived. No new comments can be posted.

Your Java Code Is Mostly Fluff, New Research Finds

Comments Filter:
  • Makes sense to me (Score:5, Insightful)

    by Anonymous Coward on Wednesday February 11, 2015 @01:56PM (#49031417)

    I'll admit I just read the summary article and not the paper itself, but I wouldn't say that this is overly surprising.

    Right off the bat due to this preoccupation we Java types seem to have with accessor methods (which I think if we admit, do something besides just set or get a private member variable like 1% of the time, why the hell we still do this I don't know..), and the frequent necessity for hash, clone, and equals methods, most of which is auto-generated, you end up with a bunch of small methods that do very little but up the code count.

    Beyond that, I think good design usually works out this way. You (or at least I like to) build up in layers, each layer using the previous layer at a higher level, until you get to the top where you have a few seemingly simple bits of code that pull it all together. When you get big complex functions doing a bunch of stuff vs the described small functions adding little bits of functionality along the way, I think you are doing things wrong.

    That's not to say people (and this is common in Java) go way overboard and end up with huge chains of methods that just pass the buck and complex control structures where you need a debugger to figure out whats going on, but if done right it can make for easily maintained and readable code.

    • Nonsense (Score:4, Interesting)

      by Anonymous Coward on Wednesday February 11, 2015 @03:37PM (#49032627)

      the code written, in the summer of 2012 the researchers downloaded 1,000 of the most popular Java projects from Apache, Eclipse, GitHub, and SourceForge. From that they got 100 million lines of Java code and tossed out simple methods (those with less than 50 tokens).

      So they tossed methods that were wrtten well. (methods that only do one thing) So if you wrote a simple 2 line validation of an input field. Field must be populated. Field must match regex. They tossed that as chaff?

      • Re:Nonsense (Score:5, Interesting)

        by lgw ( 121541 ) on Wednesday February 11, 2015 @05:01PM (#49033331) Journal

        So they tossed methods that were wrtten well. (methods that only do one thing) So if you wrote a simple 2 line validation of an input field. Field must be populated. Field must match regex. They tossed that as chaff?

        Why the Hell should you have to write code over and over to validate that a reference isn't null, or an int is positive, or other such cases. Sure that's all part of the interface contract anyhow, right? For that matter, why is "allowed to be null" the default rather than an exceptional special case. Why isn't there a simple operator that decorates a parameter as "nullable" with a single character.

        Why not simply

        public Foo foo;

        No getter or setter needed, by default it can't be null. For those odd cases where null actually means something useful, then just write:

        public Foo? foo;

        This goes double for C#, where "?" is already established as the "nullable" decorator.

        Worth noting that many Java coders use Lombock to effectively achieve this already, just with auto-generated getters and setters, since we lack the courage ad/or authority to just have public members instead of pointless getters and setters.

        And, above all else, give us a way to declare that the returned value can't be null, and auto-throw if it is, so the caller never has to check!

  • by kaputtfurleben ( 818568 ) on Wednesday February 11, 2015 @01:58PM (#49031437)
    This article uses a lot of words to say absolutely nothing.
    • by msauve ( 701917 ) on Wednesday February 11, 2015 @02:04PM (#49031533)
      I think they're advising that you remove all error checking, help messages, and logging, since that's not required for "core functionality."
      • by Anonymous Coward on Wednesday February 11, 2015 @02:23PM (#49031787)

        Comments and descriptive variable and method names should also go, we're much better with "void x(int c) { a.b(c); x.b.g.y(c) }", as the real coders do not maintain code, they just write it. And the disk space is so expensive that even linefeeds should be avoided whenever possible.

      • by IamTheRealMike ( 537420 ) on Wednesday February 11, 2015 @02:24PM (#49031805)

        Plus other bits of code actually required to make it run.

        They also say that they think the same findings would hold for C++. So whilst it's a bit hard to know if this technique is useful without reading and pondering the paper, it isn't saying much about Java specifically.

        That said - we all know Java is a very simple and verbose language. That has some advantages like ultra-fast compiles, but lots of disadvantages too. So here I'm gonna point out Kotlin [kotlinlang.org], which is a new JVM language with transparent Java interop (in both directions). It's a lot more concise and expressive than Java, whilst simultaneously having a stricter type system. The neat thing about Kotlin is, it's developed by JetBrains so you get completely seamless integration with their refactoring IDE. Also there is a Java-to-Kotlin converter feature that lets you turn a Java file into a Kotlin file instantly, and you can convert a codebase on a class-by-class basis. So you can start using the features of the new language right away. Also, it runs on Java 6, so it's Android compatible.

      • by Anonymous Coward

        Well, no. They're doing none of that.

        From a quick skim through the paper, they more or less conclude that java program text compresses really well, since it's full of redundancy, scaffolding, and so on, and so forth. I'd say they need quite a few words to beat around the bush and imagine all sorts of more or less related things, but this is the core of their findings.

        This finding is fairly obvious since well-known, certainly compared to certain other languages, but now in some light science sauce made with

    • I think what they are trying to say is that when you are buying a car you are mostly paying for fluff and not for the bare essentials.

      The bare essentials in a car is a drive train, the engine, maybe, MAYBE the gas tank and the steering wheel and the gas pedal.

      Everything else is fluff you are paying for. Wouldn't you rather just pay for the bare necessities and the hell with the fluff?

      (oh, and if you want to eat an apple you have to have an apple tree somewhere, which is also mostly fluff since only the fru

  • Same for any code (Score:5, Insightful)

    by Ubi_NL ( 313657 ) <joris.benschopNO@SPAMgmail.com> on Wednesday February 11, 2015 @01:58PM (#49031455) Journal

    In my experience, 80% of my code deals with checking for user error and thing like that (i.e not enter a string where i expect a number, does this socket really exist). This is important functionality, but indeed, it is not 'core'...

    • by Dutch Gun ( 899105 ) on Wednesday February 11, 2015 @02:36PM (#49031957)

      Agreed. As the saying goes: "The devil is in the details".

      It's often very easy and quick to write the "core" functionality, but dealing with exceptions (both in workflow and code), one-offs and special rules, shifting requirements, scope creep, etc, etc... It may not be core, but it's a huge amount of work to write it all. I remember a saying that went something like "80 percent done... now you've only got 80 percent to go", meaning that the perception of being "nearly finished" is much different than the reality.

      It's especially bad when you're racing to meet a milestone with payment tied to specific functionality (I've seen this in the videogame industry), and just barely write enough code to more or less hit that "easy" initial 80 percent, but never get that "last 80 percent" until the end of the project. It ends up as a hellish crunch-mode disaster at the end of the projects, with managers not understanding why the project seems to implode near the end.

      • by quantaman ( 517394 ) on Wednesday February 11, 2015 @04:09PM (#49032911)

        I agree that a certain level of fluff is essential, but some also comes from the language itself. Getters/setters are a great example, that's a lot of fluff that almost vanishes in a language like python without detracting from maintainability or stability. Errors are a more subtle example, what kinds of errors are possible given the language and API? At what level does the API want you to handle errors? How much code do you need to handle those errors properly? This can greatly influence the volume of necessary fluff.

    • Actually, what I found out is that with most applications you spend an awful lot of code on fetching and displaying data. A typical web applications that uses a database comes mostly down to:

      • Fetch parameters
      • Build query
      • Fetch data
      • Do something with the data, like concatenating first name/last name, make a 'pretty' date, add totals (this would be the core functionality of that page)
      • Build HTML

      Even when you use a framework this is still 90% of the code. There are some frameworks that allow for direct display of

  • The alternative (Score:4, Insightful)

    by halivar ( 535827 ) <.bfelger. .at. .gmail.com.> on Wednesday February 11, 2015 @02:01PM (#49031487)

    Imagine a language with no fluff, no cruft, no boilerplate. Everything is essential and concise. You have something akin to either assembly or too-clever Perl. The fluff is necessary. The fluff provides context, readability, and maintainability.

    • Re:The alternative (Score:4, Insightful)

      by Trepidity ( 597 ) <[delirium-slashdot] [at] [hackish.org]> on Wednesday February 11, 2015 @02:06PM (#49031557)

      I agree you can get too clever with concise syntax, but Java really does not seem like it's at optimal point on that tradeoff. Some really common things are very verbose, to the extent that it harms readability imo.

    • The alternative (Score:2, Insightful)

      by Anonymous Coward

      If every single program in the universe contains the same boilerplate strings... They are indeed unnecessary. Java is just about the worst for this. Python requires drastically less redundant meaningless fluff.

    • by arth1 ( 260657 )

      Imagine a language with no fluff, no cruft, no boilerplate. Everything is essential and concise. You have something akin to either assembly or too-clever Perl. The fluff is necessary. The fluff provides context, readability, and maintainability.

      It also provides its own bug opportunities. Indeed, from looking at what Coverity finds, most defects wouldn't have existed without the fluff.

      I'm not advocating that people migrate to assembly or perl, but whenever you cannot point at just where something happens, you have overused abstractions.

      • Indeed, from looking at what Coverity finds, most defects wouldn't have existed without the fluff.

        Really?

    • by itzly ( 3699663 )

      I always write my code in tar.gz format, you insensitive clod.

    • This has no relationship to assembly at all. If you were to rewrite a Java program in assembly, you'd have to replicate *everything* the language does yourself. You'd have even more boilerplate.

      The whole point of a high-level language is to avoid having to redo the same work for every project, or having to redo something that someone else could have done for you, and to express what you want the computer to do more concisely. Instead of, for instance, looking up the memory location of an array member and

    • No, what you end up having is Lisp.

    • by Selur ( 2745445 )

      expandability rarely goes hand in hand with simplicity when looking at the whole code,..

    • SmallTalk would be a better example, ad it is a high level language ;) and is more or less keyword less, has an extreme high code densitiy.
      Unfortunately it is out of fashion right now, very very slowly gaining momentum again.

    • Perhaps you are thinking of APL [wikipedia.org]?
      • by stox ( 131684 )

        Rob Pike and I came to the conclusion that you can get too concise in a language. The example was APL.

      • Or perhaps TECO [wikipedia.org], the world's most perfect text editor? A TECO command string is famously like random keystrokes. I suppose comments were possible, but I don't remember using any.
        • I used to write in a language that I had reduced to strings of numbers. Instead of writing out the full command, I had listings of pairs of numbers.
    • The fluff is necessary. The fluff provides context, readability, and maintainability.

      That's what comments and code-contracts are for. Well-defined interfaces. Extra lines of code gives bugs more lines to hide in.

    • It's not that; the author is patently insane.

      The author is saying that "if (a

      Crap dangling around a program isn't just necessary for it to run. All those keywords, function calls, common APIs, and the like tell the program what to do with the data you're telling it to manipulate.

  • by bigsexyjoe ( 581721 ) on Wednesday February 11, 2015 @02:02PM (#49031499)

    But I shoot to make 100% of the code I write fluff.

  • There is a old phrase about code being poetry. Java's the flowery kind rather than that Haiku.

  • by gstoddart ( 321705 ) on Wednesday February 11, 2015 @02:03PM (#49031519) Homepage

    A couple of important points to keep in mind here. First, the MINSET itself is not executable; itâ(TM)s merely the smallest subset of the code which characterizes the core functionality. Some of the other 95% of the code (the chaff) is required to make it run, so itâ(TM)s not useless.

    So, we can do a computer transform on it to make it into something a computer can express efficiently, but we ignore the fact that the other 95% of the code is the error checking and other shit which you can't do without.

    The whole premise of this "study" has nothing to do with code, how to write it, or what that entails.

    I once had a co-worker who kept telling me that lisp or scheme would magically make it so you just wrote a two line program -- something like "getReady; justDoIt".

    When I asked him who the hell would write "getReady" and "justDoit", he seemed to think it would be some magic step which sorted itself out. The hard parts don't just magically happen. I can write main() in C which says "getReady(); justdoIt();" -- that doesn't mean that I don't need to implement those parts.

    This sounds equally stupid.

    Since when have coders started subscribing to wishful thinking where you just wave your hands and the computer does all the hard stuff?

    • Re: (Score:3, Funny)

      by Anonymous Coward

      Wow, your co-worker sounds like an idiot. Everyone knows in lisp it would be (justDoIt (getReady)). It's the functional paradigm that makes it magic and that makes it ONE line not two.

    • A C call to "strnlen" is required to make your program run, but if you wrote your own version of that standard library function, which did the exact same thing, *that* is useless. A program which simply calls that function uses the functionality without actually making it part of the program (the source part), simply by including a library and calling it. That function could be considered "boilerplate" if you made your own version and included it in every program you wrote, instead of using the library; i

      • All that error-checking is necessary too, but by now, shouldn't we have high-level languages where most of that stuff is handled automatically?

        Ah ... of course ... magic unicorns make it so you can type one command, and the system will identify all possible outcomes, and properly take action for you.

        I have yet to be convinced that programming has reached the point where the language can cover all possible outcomes and do it correctly.

        You're either writing something someone else has written, or you're writin

  • by Bob9113 ( 14996 ) on Wednesday February 11, 2015 @02:03PM (#49031521) Homepage

    Really? Are they just pointing out that source code is meant for human readability, and the actual instructions are more concise? Is anyone surprised by this? Even a quick compression test shows me 80% reduction without even removing the most obviously human-oriented stuff like comments and long variable names.

    Can I get some of this research grant money? I've got a theory about sparse matrices mostly containing zeros.

  • by JoeyRox ( 2711699 ) on Wednesday February 11, 2015 @02:04PM (#49031525)
    90% of the time is spent executing 10% of the code. But when something goes wrong you want that other 90% of the code to be there so that you don't l lose 100% of your work :)
  • Waste in Housing (Score:5, Insightful)

    by lordeveryman ( 853166 ) on Wednesday February 11, 2015 @02:05PM (#49031547)
    Did you know that only about 5% of the average house is actually load bearing? The rest is just fluff. Why are we wasting so much valuable material in houses?
  • I'm curious as to why this matters. When I write functions I write lots of other code that doesn't pertain specifically to the objective but is required to provide stable reusable code. E.g. Re-working the data that was input so it can fit within the mold that is the core isn't representative of what the program's objectives BUT is required to achieve the final goal. Same goes for the interface and the validation routines. They don't depict the core function of the software but are critical to the successfu

  • Every single job I ever had, since part time summer jobs to my current job was 90% simple and 10% interesting.

    I learned how to do 90% of the work in a week, but the other 10% you never finishing learning how to do. Of course, that last 10% is the difference between a professional/expert and a rank amateur.

    I think this is due to the mental capacity of human beings. If the job is so complex you can't learn how to do most of the work quickly, then we split the job into two or more sub-jobs.

    The same guy th

  • Until we can read and write in huffman encoding, that's the way programming languages will always be.

  • by rubypossum ( 693765 ) on Wednesday February 11, 2015 @02:11PM (#49031637)
    It seems like the Java ecosystem is fine tuned for producing a high signal to noise ratio as far as intent of code is concerned. So much of the ecosystem stresses templates, massive IDEs and other automated tools that make the production of thousands of lines of unnecessary boilerplate incredibly easy. Besides, isn't this the nature of Java anyway? It seems like it's designed to produce the most verbose code possible in the hope that if everything is explicit more bugs can be diagnosed since the compiler has more to work with. It's almost a troll article, seriously, it's like the guy is just tryiing to piss people off.
  • I am sure it depends on a chosen technology, though (partly because technology defines selected group of authors).

    This percentage would probably go up to low %20-30s in C++/Objective C and the like and well over %50 in C. Assembly would surely be virtually %100.
    I wonder what Perl or Python would get, though (probably would fare only a bit better than Java)

    Pure speculation, of course.

  • by engineerErrant ( 759650 ) on Wednesday February 11, 2015 @02:30PM (#49031877)

    This sounds like the same hand-wavy BS that spawned our current infestation of Agile consultants.

    They aren't even trying to be scientific here; this is just baldfaced click-bait, likely commissioned by some unproductive company who wants to look like a "thought leader." What are they even defining as "wheat" and "chaff"? Who decides which lines of code are which? Who decides who gets to decide that? What does it even mean to describe what code "does"?

    Smart people can disagree about best practices and what constitutes "good" code - ultimately, I think most of it boils down to personal taste rather than any notion of objective correctness or big-picture productivity. Personally, I feel most productive in Java - but that's because of an interlocking mesh of many subtle reasons and has nothing to do with how many bytes my code files take up.

  • Really it is a culture who have been mining in a pit for so long that they reason that getting to China is the easiest way out. They might be right.
  • Your Java Code Is Mostly Fluff

    Nope. Not mine. Coming from C and C++, I know what I'm doing.

  • The purpose of java is to keep legions of mediocre corporate coders from doing too much damage to each other, and it does pretty good at that. Tediously spelling everything out is one way to try to force some context for code you don't see often. Think COBOL.

  • Three things:

    - First, Java is needlessly wordy - consider the necessity of explicity writing getters/setters for any class where you want access control. What a pile of code for nothing.

    - Second, you can write cryptic code or you can write understandable code. Understandable code involves a few more newlines, so what?

    - Lastly, depending on your developers, yes, you can have overly long code. Someone who re-implements the same functionality 10 times instead of defining an abstract class and implementing it o

  • by ehud42 ( 314607 ) on Wednesday February 11, 2015 @02:48PM (#49032133) Homepage

    "all programs can be optimized, and all programs have bugs; therefore all programs can be optimized to one line that doesn't work"

  • The first response to this kind of it is 'So what?'. They made up a metric and found that in Java it's 5%. Whoop. They didn't even examine any other languages to see if the metric varies (if they had, perhaps it would be in someway interesting, though I doubt it would be particularly enlightening.)

    There's nothing you can do with this information. Total waste of time.

  • PKZIP.EXE and PKUNZIP.EXE, together, are about 80 kilobytes.

    The current version of WinZip for Mac is 26 megabytes, or 26,000 kilobytes. That's a 32,500% size increase for the same basic functionality.

    However, I don't see a lot of people preferring the command-line versions. Why? Because it's easier to drag-and-drop a bunch of files into a dialog box and select an output location and folder, than to type all of that crap into the command line WITH the right flags AND no typos.

    Things like menus, option

  • by netsavior ( 627338 ) on Wednesday February 11, 2015 @03:15PM (#49032429)
    These violent delights have violent ends
    And in their triump die, like fire and powder
    Which, as they kiss, consume

    ->
    Boy meets the wrong girl, they die for love.

    ->
    Boy, girl, dead.

    ->
    people.forEach(die)


    I mean, sure it gets the job done, but man, might as well just pay someone in India to write and read it.
  • your 'article' is as well...

    (repeated because /. doesn't think my title conveys my message clearly.)
  • by Khashishi ( 775369 ) on Wednesday February 11, 2015 @03:55PM (#49032783) Journal

    Although fluffy code was nearly ubiquitous in all code samples examined, the researchers found that the best quality code could be found at http://www.ioccc.org/ [ioccc.org]

We are Microsoft. Unix is irrelevant. Openness is futile. Prepare to be assimilated.

Working...