Forgot your password?
typodupeerror
Programming Linux

Simpler "Hello World" Demonstrated In C 582

Posted by kdawson
from the non-obfuscated dept.
An anonymous reader writes "Wondering where all that bloat comes from, causing even the classic 'Hello world' to weigh in at 11 KB? An MIT programmer decided to make a Linux C program so simple, she could explain every byte of the assembly. She found that gcc was including libc even when you don't ask for it. The blog shows how to compile a much simpler 'Hello world,' using no libraries at all. This takes me back to the days of programming bare-metal on DOS!"
This discussion has been archived. No new comments can be posted.

Simpler "Hello World" Demonstrated In C

Comments Filter:
  • BTDT (Score:3, Insightful)

    by argent (18001) <peter@NOsPam.slashdot.2006.taronga.com> on Tuesday March 16, 2010 @09:05PM (#31504044) Homepage Journal

    *sigh*

    Been there done that... on the PDP-11 in 1979.

  • So what? (Score:2, Insightful)

    by Anonymous Coward on Tuesday March 16, 2010 @09:07PM (#31504072)

    Adding a static 11k or so is insignificant for any program which actually does anything useful.

  • Nice but? (Score:5, Insightful)

    by garcia (6573) on Tuesday March 16, 2010 @09:11PM (#31504106) Homepage

    Ok, this is wicked great in theory. Our programs have become bloated. We do have them taking up too much RAM, HD space, and CPU time. But after reading through this in-depth analysis I have to wonder if it's all worth it.

    If we're willing to leave behind all pretenses of portability, we can make our program exit without having to link with anything else. First, though, we need to know how to make a system call under Linux.

    Or I can just write it the old way, making the file size larger and not have to concern myself with portability and how to make system calls under Linux. After all that's what the whole point of this all was right?

  • Umm, but (Score:5, Insightful)

    by Psychotria (953670) on Tuesday March 16, 2010 @09:17PM (#31504136)

    Since when does a Hello World program not actually output anything?

  • Re:11k Is Too Big? (Score:5, Insightful)

    by CapnStank (1283176) on Tuesday March 16, 2010 @09:18PM (#31504140) Homepage
    I think you missed the point of the article.

    The author is trying to highlight that amount of bloat in modern programs is so rampant that even "Hello World" is excessively over sized for what it accomplishes. How can we as programmers expect fast, efficient, lightweight code when our compiler (even ones as popular as gcc) are bloating the program without being asked to?
  • If it's so simple, (Score:5, Insightful)

    by newcastlejon (1483695) on Tuesday March 16, 2010 @09:18PM (#31504146)
    Why doesn't it fit in TFS?
  • Re:11k Is Too Big? (Score:5, Insightful)

    by exasperation (1378979) on Tuesday March 16, 2010 @09:18PM (#31504150)

    As to the point of this... we recently had a story about how computers had gotten "too big to understand".

    And here we have a program, 45 bytes long, for which every single byte has a well-explained purpose. It's getting back to the bare metal and that's what makes it interesting. =)

  • by kenh (9056) on Tuesday March 16, 2010 @09:24PM (#31504204) Homepage Journal

    At the end, the code was assembler, and the compiler wasn't even called - just the linker. I can't say for sure where a C program ends and an assembler program begins, but I'm fairly certain that the last few iterations are assembler, based on the "let's do away with the compiler" suggestion.

    Also, "Hello World" programs have to, you know, actually display the message "Hello World" - this is a program that isn't written in C, and doesn't write "Hello World" - care to revisit the title of this entry?

  • Re:11k Is Too Big? (Score:4, Insightful)

    by WrongSizeGlass (838941) on Tuesday March 16, 2010 @09:33PM (#31504264)
    I understand the point of the article, and everything else mentioned here. I just think that the amount of time spent eliminating 11k from a program in this case is irrelevant because any real application is going to need libc. It's not like she needed to strip it out so it would fit inside a tiny corner of an embedded processor - she's probably running it on a PC with anywhere from 1GB - 4GB of RAM.
  • Re:11k Is Too Big? (Score:5, Insightful)

    by gzipped_tar (1151931) on Tuesday March 16, 2010 @09:46PM (#31504326) Journal

    But my stupid build process that generates the bloated Hello World is much more maintainable. Now get off my lawn.

  • C++ is worse (Score:5, Insightful)

    by MobyDisk (75490) on Tuesday March 16, 2010 @09:48PM (#31504340) Homepage

    Shouldn't the linker remove unreferenced functions?

    I've had this problem with gcc for a while, with C++ code. I was writing some embedded code, and I wanted to use some simple C++. Just by adding a #include of one of the stream libraries. the executable grew by 200k, even though none of it was referenced. The C++ code in iostream is template-generated anyway, so even if the compiler wanted to include the code, it can't until I instantiate it.

  • Re:Occams Wedge (Score:5, Insightful)

    by Arker (91948) on Tuesday March 16, 2010 @10:01PM (#31504418) Homepage
    But it really is much simpler. The reason your 'average first year comp sci student' might find it less understandable is because they dont actually understand the bloated version either. Using a high-level language doesnt reduce complexity, quite the opposite in fact, it greatly increases actual complexity. It simply makes it easier to get something done without understanding it, and thus makes it easier to kid yourself into thinking you know what you are doing, when you dont.
  • by istartedi (132515) on Tuesday March 16, 2010 @10:04PM (#31504442) Journal

    Patch the strip utility on Linux, send in the patch and see if it gets accepted. Then let's see a follow-up of that on Slashdot. She's taking a lot of flack here; but there's value in the work. It just needs to be applied in a more practical way.

  • Re:11k Is Too Big? (Score:5, Insightful)

    by jc42 (318812) on Tuesday March 16, 2010 @10:08PM (#31504456) Homepage Journal

    Yeah, but the 45-byte program doesn't say "Hello World". In fact, there's no example that I can find in TFA that outputs that message or any other. So the summary is incorrect on its face. TFA doesn't show a simpler "Hello World" program; it doesn't show any sort of "Hello World" program at all.

    I feel cheated, and tricked into reading an article that didn't do what was advertised.

    (It's not the author's fault, of course; the author didn't claim to be writing the sort of program that the summary talked about. Though I was a bit disappointed that only the first few examples were in C. The article was almost entirely about assembly-language programs. So again, I was a bit disappointed, since I was hoping to learn something about making C programs smaller. This was done only in the first example, and it was made smaller by removing its call on write() so it didn't output anything at all. I already understood that I can make programs smaller by removing all functionality. ;-)

  • Re:So what? (Score:5, Insightful)

    by AmberBlackCat (829689) on Tuesday March 16, 2010 @10:16PM (#31504504)
    Maybe thinking like that is why we have to get 4 gigs of ram to run without slowing down lately. I bet every executable on the hard drive has an extra 11k that somebody thought was insignificant.
  • Re:BTDT (Score:5, Insightful)

    by tomhudson (43916) <barbara.hudson@NOSpAM.barbara-hudson.com> on Tuesday March 16, 2010 @10:17PM (#31504516) Journal

    She found that gcc was including libc even when you don't ask for it.

    This is basic knowledge that ANYONE using c should know - that the startup library is linked to so it can find main.

    This is almost as lame as their previous slashvertisement/product_whoring [ksplice.com] - where they claimed to have gotten around the Mythical Man-Month and quadrupled output - and it turned out that neither claim was true.

    And their lame excuse [ksplice.com], which I derided in this comment [ksplice.com]:

    Greg Price wrote:

    "what I hoped to get across in this post is that that's not true--in the right circumstances, adding people to a software project can get a lot done, even in a short time"

    As many people have pointed out, you did NOT add people to a software project. You created a dozen small, one-person projects. Your self-serving reply to all that is just one more mis-representation. Have you no shame?

    I'm sure we're not the only ones to have used embedded assembler in c programs.

  • by Dynetrekk (1607735) on Tuesday March 16, 2010 @10:22PM (#31504552)

    Hm, if I make a file 'hello.py' with the following content:

    print 42

    ...and say to Mac OS X "open .py files in the python interpreter" and double-click, it does the job. In 9 bytes. I guess you can get it shorter if you use a language with a shorter "print" statement / function?

    And how big is Python?

    Granted, but how big is linux, letting you run that ELF?

  • Re:11k Is Too Big? (Score:5, Insightful)

    by ucblockhead (63650) on Tuesday March 16, 2010 @10:28PM (#31504580) Homepage Journal

    The fact that helloworld.c compiles to 11k has less to do with bloat than it has to do with people generally not caring about 11k. You could get rid of that 11k, but to do so, you'd have to make trade offs that either make real programs either slower or bigger, or make compilation slower. Very few people would make those trade offs in the other direction. Those that do either use special purpose compilers or (more likely) write in assembly.

  • Re:Nice but? (Score:5, Insightful)

    by dido (9125) <didoNO@SPAMimperium.ph> on Tuesday March 16, 2010 @10:33PM (#31504608)

    Which is missing the point. Haven't you ever wondered what's really in that 11k of machine code, and what it actually does? We've gotten so insulated from the lower levels of our computers that we no longer really understand how they do something so basic as terminating their own execution. The article felt more to me like an expository attempt to shed light on some of the things that libc has to do for us, rather than practical advice on attempting to make our programs smaller.

  • Bare metal DOS? (Score:4, Insightful)

    by blueg3 (192743) on Tuesday March 16, 2010 @10:36PM (#31504628)

    If you're actually programming on "bare metal", you're not really using DOS, are you? After all, DOS is an operating system -- a layer between your code and the hardware.

  • Re:So what? (Score:5, Insightful)

    by wiredlogic (135348) on Tuesday March 16, 2010 @10:38PM (#31504642)

    Most of the microprocessors in the world today have less than a few 10's of kilobytes of RAM. They tend to do useful things most of the time.

  • Re:11k Is Too Big? (Score:0, Insightful)

    by Anonymous Coward on Tuesday March 16, 2010 @10:57PM (#31504748)

    You mean 50. The rest of the civilized world, those of us not eating tapioca pudding and watching Matlock between shifts of drooling on the keyboard, switched to better languages like C++ and Java

  • Re:11k Is Too Big? (Score:5, Insightful)

    by MachDelta (704883) on Tuesday March 16, 2010 @10:59PM (#31504766)

    TFA explains it: main() isn't the true start of the program, _start is. That resides in ctrl.o, which fires off a bunch of setup stuff before calling __libc_start_main, which in turn kicks off main(), and off your program goes.

    To put it as a car analogy: What she found is that turning the key to start doesn't just activate the starter, it also activates the airbag system, the traction control, and the radio too. And if all you want to do is start the engine to prove that it runs (ala Hello World!), then it's kind of silly to lug around all that extra "unnecessary" crap too.

    Or something like that. Sadly i'm a better mechanic than a programmer (4yrs vs 1yr), but i'm working on fixing that. :)

  • +5, Insightful (Score:5, Insightful)

    by aussersterne (212916) on Tuesday March 16, 2010 @11:04PM (#31504798) Homepage

    Mod parent up. This is all a semantic game about where significant portions of functionality are stored (and thus counted or not). After all, back in the "pre bloatware" days, you'd have had to manage all of the complexities of machine management and I/O yourself. The assembly would have been much larger to achieve the same effect.

    Yes, you can make the argument that Linux comes with screen I/O, a scheduler, memory management, etc. already, so that's just overhead, but as others have pointed out, you can say the same thing about bash. It comes everywhere and is just overhead.

  • Re:11k Is Too Big? (Score:5, Insightful)

    by santax (1541065) on Tuesday March 16, 2010 @11:11PM (#31504826)
    Try programming a micro-controller and suddenly you'll be facing hardware limits that force you to favor small unreadable code over bigger more maintainable code. There is a solution for it though... comments! Lots of them :D
  • Re:11k Is Too Big? (Score:3, Insightful)

    by siride (974284) on Tuesday March 16, 2010 @11:16PM (#31504848)
    Doesn't matter anyways because demand paging ensures that only the parts of libc that your program actually uses will be pulled into memory, so all the extra junk will remain on disk.
  • Re:Nice but? (Score:4, Insightful)

    by fermion (181285) on Tuesday March 16, 2010 @11:40PM (#31504956) Homepage Journal
    I disagree that programs are bloated. In most cases, we code to deliver a product at a reasonable cost. Competent trained humans are much more expensive than gates. This is why few people code in C. They want fancy features like trash collection, signaling, and GUI. While all of these can be custom coded on a case by case basis so that only the features needed are included, and the libraries are optimized. Of course competent programmers do not need trash collection, but it sure makes life easier, and can cut down on programming hours. So we tolerate a bit of inefficiency because, frankly, very few people are going to pay double the price so they can use a 500 MHz 256B computers. The average person is more likely going to pay $400 for a 2 GHZ @GB machine, and then want the software for little or no money.

    Now, that is not to say the libraries should not be optimized. It makes economic sense to spend significant time on such code. Just look at MS Vista. But complaining that we unnecessary library code is sometimes included does not really solve any problems.

  • by putaro (235078) on Tuesday March 16, 2010 @11:42PM (#31504962) Journal

    So did the original - it was launched from the command prompt and the shell was used for the output of the return code. The shell is part of the base OS anyhow, and you can't boot Linux without the shell.

  • Re:Nice but? (Score:4, Insightful)

    by Sycraft-fu (314770) on Tuesday March 16, 2010 @11:54PM (#31505002)

    Well that's part of the REASON that programs have become "bloated." We have plenty of resources these days. RAM and HDD space is cheap. So, it doesn't make sense to spend time trying to wring every byte out of a program. If having a bit of bloat makes the program more portable, or easier to debug, or more resilient to attack or whatever it is probably worth while.

    I'd much rather have a program that was 1MB larger than it needs to be, but easy for the devs to maintain and nice and compatible than one that is as small as possible but is a complete mess at the code level. As a practical matter program data, like graphics, sounds, media, etc, is way, WAY bigger than the program itself. For example Mass Effect 2 has about 25.6MB of code between its binary and various DLLs. If you count system DDLs it uses, it is maybe up to 50MB. It's total size? 12.1GB. All the rest is data of various kinds. They could halve the size of the code and still not make even a tiny dent in disk or memory usage.

  • by cthugha (185672) on Wednesday March 17, 2010 @12:02AM (#31505038)

    Since the output is the Answer to the Ultimate Question, it necessarily incorporates or encodes every possible output of every possible program, including the string "Hello World!".

    The method for extracting the particular output desired is left as an exercise for the reader.

  • Re:So what? (Score:4, Insightful)

    by Yosho (135835) on Wednesday March 17, 2010 @12:07AM (#31505062) Homepage

    I bet every executable on the hard drive has an extra 11k that somebody thought was insignificant.

    So if you have, say, 1000 open processes, that means your computer is wasting 11 MB of RAM. Such inefficiency!

    Actually, the reason you need 4 GB of RAM is because the programs you're using are far more complex than the ones that people were using when 256 MB was top-of-the-line. You may say, "But all I need is to read e-mail and browse the web!" -- except that nowadays those tasks involve rendering GUIs with Javascript, streaming and playing HD video in realtime, and doing constant full-text indexing in the background so that you can quickly search anything for any phrase. On top of that, in the background your operating system is trying to predict what you'll do next and prefetching blocks from your hard drive into RAM so that they'll already be cached when you actually need them.

    Some of that RAM is honestly being taken up by insignificant chunks of data, but most of it really is being used.

  • Re:11k Is Too Big? (Score:3, Insightful)

    by PakProtector (115173) <cevkiv@NOSPAm.gmail.com> on Wednesday March 17, 2010 @12:21AM (#31505124) Journal

    OOP makes people lazy and gives them less of an understanding of what's actually going on.

    All that OOP code you write gets translated back into something procedural, you know.

  • Re:So what? (Score:1, Insightful)

    by Anonymous Coward on Wednesday March 17, 2010 @12:22AM (#31505128)

    Actually the primary reason your system is so slow, is because your using 1960's technology for storing your data. It is perfectly normal... and expected that your computer will need to store and retrieve information to load Windows / Programs / Web browser's etc.. using an old slow Hard Drive is the primary reason systems are so slow nowadays. As it starts swapping data to your 1960's technology (aka Hard Drive) the whole system slows to a crawl while your super fast memory and super fast CPU waits forever for read/write heads just to move around.

    You could take an old P4 with 2GB with Win7 and slap an SSD Drive into it, and it will run circles around the top of the line corei7 systems in normal day to day home and office tasks.

    I can see your point about memory waste though, years back as part of a school project I had to create a "Hello world" visual basic program, and I was flabbergasted at how long it took to open and how much memory it took, for such a simple task.

  • Re:11k Is Too Big? (Score:4, Insightful)

    by blackraven14250 (902843) on Wednesday March 17, 2010 @12:33AM (#31505160)
    It's funny that this always come up in conversations about bloat, because not everyone has to program for embedded code, because not everyone is programming embedded devices. It's almost like you guys are a subsubculture of programmer, to the point where many of you guys come off with the general attitude of being superior, when in fact, neither approach is superior, just different based on the situation.
    /rant
  • Re:11k Is Too Big? (Score:4, Insightful)

    by crazybit (918023) on Wednesday March 17, 2010 @12:59AM (#31505246)
    Not only more maintainable, but filesystems should use 4k per sector, specially on raid's for performance stuff discussed on this post [slashdot.org]. This means that in a decently configured modern system, anything under 4k will still occupy 4k on disk.
  • Re:C++ is worse (Score:3, Insightful)

    by ShakaUVM (157947) on Wednesday March 17, 2010 @01:09AM (#31505278) Homepage Journal

    Shouldn't the linker remove unreferenced functions?

    I've had this problem with gcc for a while, with C++ code. I was writing some embedded code, and I wanted to use some simple C++. Just by adding a #include of one of the stream libraries. the executable grew by 200k, even though none of it was referenced. The C++ code in iostream is template-generated anyway, so even if the compiler wanted to include the code, it can't until I instantiate it.

    There's utilities you can run to pull unused object code out of your file, to make the executable smaller.

    But in general, if you are care about a 300k increase in your executable, you should probably be using C anyway.

  • Re:Occams Wedge (Score:3, Insightful)

    by Arker (91948) on Wednesday March 17, 2010 @01:26AM (#31505332) Homepage

    It hides the increased complexity from the programmer.

    That is what I said.

    Which resources do you think are the most valuable?

    That is not a question for which a single good answer can be given, other than "it depends." There are so many variables. Just how often will how many processors do extra work (that will allow you to calculate the lost electricity - it is a real and calculable cost.) RAM usage also has real costs associated, including electricity, but calculating the final price tag is far more complicated there. But the bottom line is that it is way too much work to really track down and calculate to the penny the costs of innefficient code, even in the narrowest of sense, so no one does. We just sort of guess-timate, and we work within systems that dont encourage us to account for costs that can be passed on unaccounted for, so we generally do it in that manner. That the outcome is naturally for many actors to weigh the decision purely in terms of their own personal and immediate costs and benefits (15 minutes of my time vs. small performance hit to whomever uses) without accounting at all for many less personal and less immediate effects.

    And often enough that works just fine. But there are cases where it will bite you hard. Knowing which situation is which is important. How are you going to do that if you only know quick-and-cheap method without understanding the larger picture?

  • Re:So what? (Score:3, Insightful)

    by rolfwind (528248) on Wednesday March 17, 2010 @01:30AM (#31505354)

    So if you have, say, 1000 open processes, that means your computer is wasting 11 MB of RAM. Such inefficiency!

    When even modern processors have single megabytes of L3 cache, and less L2/L1 cache - it will make a difference if you're swapping from that to RAM constantly.

  • by N0Man74 (1620447) on Wednesday March 17, 2010 @03:05AM (#31505636)

    It's too bad with all these things you "heard" that you didn't happen to hear that programs are written for environments other than Windows (or Linux, Mac OS, etc), and for devices other than PCs. It's unfortunate that you are so in the dark that you don't realize that there are entire industries that rely on devices that have tiny fractions of the memory and processor speed that you ignorantly assume that we all have access too. You probably have no idea how often you are affected by devices that run 100 times slower than the desktop PC you gave as an example, or also have 1,000 times less RAM. On some of these devices C is the most advanced language you can get short of writing a compiler or interpreter yourself.

    Sure, pissing away storage space and waving a hand at execution efficiency is fine for some circumstances, but sometimes it's a luxury you can't afford. The world of software development is far bigger than the tiny little niche of programming you've been exposed to.

    I suggest you use some "real" perspective, and reevaluate what a "real language" is.

  • Re:11k Is Too Big? (Score:3, Insightful)

    by TapeCutter (624760) * on Wednesday March 17, 2010 @03:19AM (#31505696) Journal
    "OOP makes people lazy and gives them less of an understanding of what's actually going on."

    I've noticed that people who critise OOP rarely understand what it is and tend to think it's tied to a particular type of language. OO is a way of thinking about a problem at a higher level than functional decomposition. You can code an OO solution in whatever language you like. Done properly it leads to elegant solutions eg; many of the examples in K&R exhibit the features of OO design and they were created before the term "object orientated" was coined. I assume when K&R used function pointers as elements of a struct they "understood what's going on", right?

    "All that OOP code you write gets translated back into something procedural, you know."

    Perhaps that's because...you know...OOP is procedural.
  • Re:BTDT (Score:5, Insightful)

    by kevingolding2001 (590321) on Wednesday March 17, 2010 @04:51AM (#31506148)

    *sigh*

    Been there done that... on the PDP-11 in 1979.

    And did you write up a nice article for other people to learn from what you had done?

    I think the real value here is not that she did this, but that she wrote it up in a nice easy to read way so that you can follow her train of thought and get a feel for how one goes about tinkering with compilers and such.

    This adds value for people like me who are not as smart as you. I could never have done this on a PDP-11 (although I did have access to one back in my days at university). I also previously would not have know enough to do this in Linux. But having read this article I feel I have learnt something and have a new insight into how linkers and libraries work. Who knows, maybe I will be able to do something similar myself after this learning experience, and for that I am grateful to Jessica for doing it, writing about it and (I'm guessing it was her) submitting it to /.

    Now I shall respectfully step off your lawn.

  • Re:So what? (Score:3, Insightful)

    by swilver (617741) on Wednesday March 17, 2010 @05:19AM (#31506296)

    It will be completely unnoticeable, even if you had a stopwatch.

    Not only is this easy to see theoretically, as most programs will spend the bulk of their CPU time in tight loops, which obviously will be cached the first run through... but it's also easy to see in practice, for example, when processor performance with different cache sizes is compared.

    Multitasking is probably one of the worst things imagineable for processor caches, yet even with 1000's of context switches every second the difference between a single tasking machine and a multitasking machine will be hard to notice on modern hardware.

  • Re:BTDT (Score:3, Insightful)

    by RMS Eats Toejam (1693864) on Wednesday March 17, 2010 @06:39AM (#31506666)
    A better solution for everyone is to replace Kdawson with an editor who doesn't have shit for brains. This would give us more quality articles and less garbage without having to modify any settings at all.
  • Re:BTDT (Score:5, Insightful)

    by jlehtira (655619) on Wednesday March 17, 2010 @07:22AM (#31506940) Journal

    She found that gcc was including libc even when you don't ask for it.

    This is basic knowledge that ANYONE using c should know - that the startup library is linked to so it can find main.

    Okay, and where am I supposed to learn it from? That was new to me, after using gcc for a very long time.

    I'm actually very happy that someone out there told me something that you think I should just know.

    So it wasn't new to you? Don't read it.

  • Re:BTDT (Score:2, Insightful)

    by Anonymous Coward on Wednesday March 17, 2010 @09:30AM (#31508260)

    Maybe he simply meant "There aren't too many people in the programming world who do anything worth noting, and there are fewer women still in the programming world, so it's pretty cool for her to have done something that people are checking out".

    What you said is like asking a black guy what his favourite food is and him saying "Why? You think I'm going to like fried chicken or something, just because I'm black?!" when you really didn't mean anything about his race specifically at all, you only wanted to know what his favourite food was.

    I personally always like to hear about a woman doing something of note in IT... like the GP said, it tends to be male-dominated, so I like to hear of more and more women getting involved in the field.

No man is an island if he's on at least one mailing list.

Working...