Simpler "Hello World" Demonstrated In C 582
An anonymous reader writes "Wondering where all that bloat comes from, causing even the classic 'Hello world' to weigh in at 11 KB? An MIT programmer decided to make a Linux C program so simple, she could explain every byte of the assembly. She found that gcc was including libc even when you don't ask for it. The blog shows how to compile a much simpler 'Hello world,' using no libraries at all. This takes me back to the days of programming bare-metal on DOS!"
BTDT (Score:3, Insightful)
*sigh*
Been there done that... on the PDP-11 in 1979.
So what? (Score:2, Insightful)
Adding a static 11k or so is insignificant for any program which actually does anything useful.
Nice but? (Score:5, Insightful)
Ok, this is wicked great in theory. Our programs have become bloated. We do have them taking up too much RAM, HD space, and CPU time. But after reading through this in-depth analysis I have to wonder if it's all worth it.
If we're willing to leave behind all pretenses of portability, we can make our program exit without having to link with anything else. First, though, we need to know how to make a system call under Linux.
Or I can just write it the old way, making the file size larger and not have to concern myself with portability and how to make system calls under Linux. After all that's what the whole point of this all was right?
Umm, but (Score:5, Insightful)
Since when does a Hello World program not actually output anything?
Re:11k Is Too Big? (Score:5, Insightful)
The author is trying to highlight that amount of bloat in modern programs is so rampant that even "Hello World" is excessively over sized for what it accomplishes. How can we as programmers expect fast, efficient, lightweight code when our compiler (even ones as popular as gcc) are bloating the program without being asked to?
If it's so simple, (Score:5, Insightful)
Re:11k Is Too Big? (Score:5, Insightful)
As to the point of this... we recently had a story about how computers had gotten "too big to understand".
And here we have a program, 45 bytes long, for which every single byte has a well-explained purpose. It's getting back to the bare metal and that's what makes it interesting. =)
Simpler "Hello World" in C? (Score:5, Insightful)
At the end, the code was assembler, and the compiler wasn't even called - just the linker. I can't say for sure where a C program ends and an assembler program begins, but I'm fairly certain that the last few iterations are assembler, based on the "let's do away with the compiler" suggestion.
Also, "Hello World" programs have to, you know, actually display the message "Hello World" - this is a program that isn't written in C, and doesn't write "Hello World" - care to revisit the title of this entry?
Re:11k Is Too Big? (Score:4, Insightful)
Re:11k Is Too Big? (Score:5, Insightful)
But my stupid build process that generates the bloated Hello World is much more maintainable. Now get off my lawn.
C++ is worse (Score:5, Insightful)
Shouldn't the linker remove unreferenced functions?
I've had this problem with gcc for a while, with C++ code. I was writing some embedded code, and I wanted to use some simple C++. Just by adding a #include of one of the stream libraries. the executable grew by 200k, even though none of it was referenced. The C++ code in iostream is template-generated anyway, so even if the compiler wanted to include the code, it can't until I instantiate it.
Re:Occams Wedge (Score:5, Insightful)
OK, now generalize that (Score:2, Insightful)
Patch the strip utility on Linux, send in the patch and see if it gets accepted. Then let's see a follow-up of that on Slashdot. She's taking a lot of flack here; but there's value in the work. It just needs to be applied in a more practical way.
Re:11k Is Too Big? (Score:5, Insightful)
Yeah, but the 45-byte program doesn't say "Hello World". In fact, there's no example that I can find in TFA that outputs that message or any other. So the summary is incorrect on its face. TFA doesn't show a simpler "Hello World" program; it doesn't show any sort of "Hello World" program at all.
I feel cheated, and tricked into reading an article that didn't do what was advertised.
(It's not the author's fault, of course; the author didn't claim to be writing the sort of program that the summary talked about. Though I was a bit disappointed that only the first few examples were in C. The article was almost entirely about assembly-language programs. So again, I was a bit disappointed, since I was hoping to learn something about making C programs smaller. This was done only in the first example, and it was made smaller by removing its call on write() so it didn't output anything at all. I already understood that I can make programs smaller by removing all functionality. ;-)
Re:So what? (Score:5, Insightful)
Re:BTDT (Score:5, Insightful)
This is basic knowledge that ANYONE using c should know - that the startup library is linked to so it can find main.
This is almost as lame as their previous slashvertisement/product_whoring [ksplice.com] - where they claimed to have gotten around the Mythical Man-Month and quadrupled output - and it turned out that neither claim was true.
And their lame excuse [ksplice.com], which I derided in this comment [ksplice.com]:
I'm sure we're not the only ones to have used embedded assembler in c programs.
Re:I can code that app in... (Score:5, Insightful)
Hm, if I make a file 'hello.py' with the following content:
print 42
And how big is Python?
Granted, but how big is linux, letting you run that ELF?
Re:11k Is Too Big? (Score:5, Insightful)
The fact that helloworld.c compiles to 11k has less to do with bloat than it has to do with people generally not caring about 11k. You could get rid of that 11k, but to do so, you'd have to make trade offs that either make real programs either slower or bigger, or make compilation slower. Very few people would make those trade offs in the other direction. Those that do either use special purpose compilers or (more likely) write in assembly.
Re:Nice but? (Score:5, Insightful)
Which is missing the point. Haven't you ever wondered what's really in that 11k of machine code, and what it actually does? We've gotten so insulated from the lower levels of our computers that we no longer really understand how they do something so basic as terminating their own execution. The article felt more to me like an expository attempt to shed light on some of the things that libc has to do for us, rather than practical advice on attempting to make our programs smaller.
Bare metal DOS? (Score:4, Insightful)
If you're actually programming on "bare metal", you're not really using DOS, are you? After all, DOS is an operating system -- a layer between your code and the hardware.
Re:So what? (Score:5, Insightful)
Most of the microprocessors in the world today have less than a few 10's of kilobytes of RAM. They tend to do useful things most of the time.
Re:11k Is Too Big? (Score:0, Insightful)
You mean 50. The rest of the civilized world, those of us not eating tapioca pudding and watching Matlock between shifts of drooling on the keyboard, switched to better languages like C++ and Java
Re:11k Is Too Big? (Score:5, Insightful)
TFA explains it: main() isn't the true start of the program, _start is. That resides in ctrl.o, which fires off a bunch of setup stuff before calling __libc_start_main, which in turn kicks off main(), and off your program goes.
To put it as a car analogy: What she found is that turning the key to start doesn't just activate the starter, it also activates the airbag system, the traction control, and the radio too. And if all you want to do is start the engine to prove that it runs (ala Hello World!), then it's kind of silly to lug around all that extra "unnecessary" crap too.
Or something like that. Sadly i'm a better mechanic than a programmer (4yrs vs 1yr), but i'm working on fixing that. :)
+5, Insightful (Score:5, Insightful)
Mod parent up. This is all a semantic game about where significant portions of functionality are stored (and thus counted or not). After all, back in the "pre bloatware" days, you'd have had to manage all of the complexities of machine management and I/O yourself. The assembly would have been much larger to achieve the same effect.
Yes, you can make the argument that Linux comes with screen I/O, a scheduler, memory management, etc. already, so that's just overhead, but as others have pointed out, you can say the same thing about bash. It comes everywhere and is just overhead.
Re:11k Is Too Big? (Score:5, Insightful)
Re:11k Is Too Big? (Score:3, Insightful)
Re:Nice but? (Score:4, Insightful)
Now, that is not to say the libraries should not be optimized. It makes economic sense to spend significant time on such code. Just look at MS Vista. But complaining that we unnecessary library code is sometimes included does not really solve any problems.
Re:I can code that app in... (Score:3, Insightful)
So did the original - it was launched from the command prompt and the shell was used for the output of the return code. The shell is part of the base OS anyhow, and you can't boot Linux without the shell.
Re:Nice but? (Score:4, Insightful)
Well that's part of the REASON that programs have become "bloated." We have plenty of resources these days. RAM and HDD space is cheap. So, it doesn't make sense to spend time trying to wring every byte out of a program. If having a bit of bloat makes the program more portable, or easier to debug, or more resilient to attack or whatever it is probably worth while.
I'd much rather have a program that was 1MB larger than it needs to be, but easy for the devs to maintain and nice and compatible than one that is as small as possible but is a complete mess at the code level. As a practical matter program data, like graphics, sounds, media, etc, is way, WAY bigger than the program itself. For example Mass Effect 2 has about 25.6MB of code between its binary and various DLLs. If you count system DDLs it uses, it is maybe up to 50MB. It's total size? 12.1GB. All the rest is data of various kinds. They could halve the size of the code and still not make even a tiny dent in disk or memory usage.
Re:Missing the point (Score:5, Insightful)
Since the output is the Answer to the Ultimate Question, it necessarily incorporates or encodes every possible output of every possible program, including the string "Hello World!".
The method for extracting the particular output desired is left as an exercise for the reader.
Comment removed (Score:4, Insightful)
Re:11k Is Too Big? (Score:3, Insightful)
OOP makes people lazy and gives them less of an understanding of what's actually going on.
All that OOP code you write gets translated back into something procedural, you know.
Re:So what? (Score:1, Insightful)
Actually the primary reason your system is so slow, is because your using 1960's technology for storing your data. It is perfectly normal... and expected that your computer will need to store and retrieve information to load Windows / Programs / Web browser's etc.. using an old slow Hard Drive is the primary reason systems are so slow nowadays. As it starts swapping data to your 1960's technology (aka Hard Drive) the whole system slows to a crawl while your super fast memory and super fast CPU waits forever for read/write heads just to move around.
You could take an old P4 with 2GB with Win7 and slap an SSD Drive into it, and it will run circles around the top of the line corei7 systems in normal day to day home and office tasks.
I can see your point about memory waste though, years back as part of a school project I had to create a "Hello world" visual basic program, and I was flabbergasted at how long it took to open and how much memory it took, for such a simple task.
Re:11k Is Too Big? (Score:4, Insightful)
/rant
Re:11k Is Too Big? (Score:4, Insightful)
Re:C++ is worse (Score:3, Insightful)
There's utilities you can run to pull unused object code out of your file, to make the executable smaller.
But in general, if you are care about a 300k increase in your executable, you should probably be using C anyway.
Re:Occams Wedge (Score:3, Insightful)
That is what I said.
That is not a question for which a single good answer can be given, other than "it depends." There are so many variables. Just how often will how many processors do extra work (that will allow you to calculate the lost electricity - it is a real and calculable cost.) RAM usage also has real costs associated, including electricity, but calculating the final price tag is far more complicated there. But the bottom line is that it is way too much work to really track down and calculate to the penny the costs of innefficient code, even in the narrowest of sense, so no one does. We just sort of guess-timate, and we work within systems that dont encourage us to account for costs that can be passed on unaccounted for, so we generally do it in that manner. That the outcome is naturally for many actors to weigh the decision purely in terms of their own personal and immediate costs and benefits (15 minutes of my time vs. small performance hit to whomever uses) without accounting at all for many less personal and less immediate effects.
And often enough that works just fine. But there are cases where it will bite you hard. Knowing which situation is which is important. How are you going to do that if you only know quick-and-cheap method without understanding the larger picture?
Re:So what? (Score:3, Insightful)
When even modern processors have single megabytes of L3 cache, and less L2/L1 cache - it will make a difference if you're swapping from that to RAM constantly.
Hey, I heard that Windows isn't the only OS... (Score:5, Insightful)
It's too bad with all these things you "heard" that you didn't happen to hear that programs are written for environments other than Windows (or Linux, Mac OS, etc), and for devices other than PCs. It's unfortunate that you are so in the dark that you don't realize that there are entire industries that rely on devices that have tiny fractions of the memory and processor speed that you ignorantly assume that we all have access too. You probably have no idea how often you are affected by devices that run 100 times slower than the desktop PC you gave as an example, or also have 1,000 times less RAM. On some of these devices C is the most advanced language you can get short of writing a compiler or interpreter yourself.
Sure, pissing away storage space and waving a hand at execution efficiency is fine for some circumstances, but sometimes it's a luxury you can't afford. The world of software development is far bigger than the tiny little niche of programming you've been exposed to.
I suggest you use some "real" perspective, and reevaluate what a "real language" is.
Re:11k Is Too Big? (Score:3, Insightful)
I've noticed that people who critise OOP rarely understand what it is and tend to think it's tied to a particular type of language. OO is a way of thinking about a problem at a higher level than functional decomposition. You can code an OO solution in whatever language you like. Done properly it leads to elegant solutions eg; many of the examples in K&R exhibit the features of OO design and they were created before the term "object orientated" was coined. I assume when K&R used function pointers as elements of a struct they "understood what's going on", right?
"All that OOP code you write gets translated back into something procedural, you know."
Perhaps that's because...you know...OOP is procedural.
Re:BTDT (Score:5, Insightful)
*sigh*
Been there done that... on the PDP-11 in 1979.
And did you write up a nice article for other people to learn from what you had done?
I think the real value here is not that she did this, but that she wrote it up in a nice easy to read way so that you can follow her train of thought and get a feel for how one goes about tinkering with compilers and such.
This adds value for people like me who are not as smart as you. I could never have done this on a PDP-11 (although I did have access to one back in my days at university). I also previously would not have know enough to do this in Linux. But having read this article I feel I have learnt something and have a new insight into how linkers and libraries work. Who knows, maybe I will be able to do something similar myself after this learning experience, and for that I am grateful to Jessica for doing it, writing about it and (I'm guessing it was her) submitting it to /.
Now I shall respectfully step off your lawn.
Re:So what? (Score:3, Insightful)
It will be completely unnoticeable, even if you had a stopwatch.
Not only is this easy to see theoretically, as most programs will spend the bulk of their CPU time in tight loops, which obviously will be cached the first run through... but it's also easy to see in practice, for example, when processor performance with different cache sizes is compared.
Multitasking is probably one of the worst things imagineable for processor caches, yet even with 1000's of context switches every second the difference between a single tasking machine and a multitasking machine will be hard to notice on modern hardware.
Re:BTDT (Score:3, Insightful)
Re:BTDT (Score:5, Insightful)
She found that gcc was including libc even when you don't ask for it.
This is basic knowledge that ANYONE using c should know - that the startup library is linked to so it can find main.
Okay, and where am I supposed to learn it from? That was new to me, after using gcc for a very long time.
I'm actually very happy that someone out there told me something that you think I should just know.
So it wasn't new to you? Don't read it.
Re:BTDT (Score:2, Insightful)
Maybe he simply meant "There aren't too many people in the programming world who do anything worth noting, and there are fewer women still in the programming world, so it's pretty cool for her to have done something that people are checking out".
What you said is like asking a black guy what his favourite food is and him saying "Why? You think I'm going to like fried chicken or something, just because I'm black?!" when you really didn't mean anything about his race specifically at all, you only wanted to know what his favourite food was.
I personally always like to hear about a woman doing something of note in IT... like the GP said, it tends to be male-dominated, so I like to hear of more and more women getting involved in the field.