Forgot your password?
typodupeerror
Programming Linux

Simpler "Hello World" Demonstrated In C 582

Posted by kdawson
from the non-obfuscated dept.
An anonymous reader writes "Wondering where all that bloat comes from, causing even the classic 'Hello world' to weigh in at 11 KB? An MIT programmer decided to make a Linux C program so simple, she could explain every byte of the assembly. She found that gcc was including libc even when you don't ask for it. The blog shows how to compile a much simpler 'Hello world,' using no libraries at all. This takes me back to the days of programming bare-metal on DOS!"
This discussion has been archived. No new comments can be posted.

Simpler "Hello World" Demonstrated In C

Comments Filter:
  • Re:BTDT (Score:2, Informative)

    by maxume (22995) on Tuesday March 16, 2010 @10:15PM (#31504122)

    The muppetlabs link ends with the entire program overlapped into the ELF header, and part of the header left off.

    (It is just a toy program that returns 42 to the OS, but he gets it down to 45 bytes.)

  • by shird (566377) on Tuesday March 16, 2010 @10:20PM (#31504172) Homepage Journal

    Indeed, this is very old news, it's been done many times before. I recall reading and applying this article for Windows many years ago:
    http://msdn.microsoft.com/en-us/magazine/cc301696.aspx [microsoft.com]

    there's also: http://www.ntcore.com/files/SmallAppWiz.htm [ntcore.com] and http://www.phreedom.org/solar/code/tinype/ [phreedom.org] (again for windows) and many more.

  • Re:Umm, but (Score:3, Informative)

    by Psychotria (953670) on Tuesday March 16, 2010 @10:36PM (#31504278)

    And I guess if you'd read the blog (the second link in the article not the third) which the summary is actually referring to, you would know that there is no output from the program written by the "MIT programmer [who] decided to make a Linux C program so simple, she could explain every byte of the assembly".

  • Re:11k Is Too Big? (Score:4, Informative)

    by Anonymous Coward on Tuesday March 16, 2010 @10:43PM (#31504306)

    The whole point was learning ELF structure and why things were they way they were. Didn't you ever wonder why a "hello world" program took over 4000 bytes on a modern computer, when in 1980 a Commodore VIC-20 managed to play games in less than 4K of available memory? This wasn't a waste of time.

  • by refactored (260886) <cyent.xnet@co@nz> on Tuesday March 16, 2010 @10:46PM (#31504328) Homepage Journal
    Parent said, I always liked the "Strangest Abuse of the Rules" catagory winner for Hello World

    char*_="Hello world.\n";

    That is it - the whole program.

    echo 'char*_="Hello world.\n"; ' > a.c
    $ gcc a.c
    /usr/lib/gcc/i486-linux-gnu/4.4.1/../../../../lib/crt1.o: In function `_start':
    /build/buildd/eglibc-2.10.1/csu/../sysdeps/i386/elf/start.S:115: undefined reference to `main'
    collect2: ld returned 1 exit status

    Doesn't say "Hello" to me!

  • by Anonymous Coward on Tuesday March 16, 2010 @10:50PM (#31504358)

    Commodore BASIC:

    ?42

  • by eggled (1135799) on Tuesday March 16, 2010 @11:01PM (#31504420)
    MUMPS: w 42
    If you want to clear the screen and add a new line: w #,42,!
  • Re:11k Is Too Big? (Score:2, Informative)

    by TyFoN (12980) on Tuesday March 16, 2010 @11:04PM (#31504440)

    This is because we are no longer linking the binaries statically (one object file for each function), but are using dynamically linked libraries. And your libc is't loaded only for your program. The same spot in ram is shared between all programs that are using it making the total ram spent for each program rather small, probably even smaller than if you would statically link the object files of the functions you need.

  • Re:C++ is worse (Score:5, Informative)

    by macshit (157376) <.miles. .at. .gnu.org.> on Tuesday March 16, 2010 @11:13PM (#31504490) Homepage

    Shouldn't the linker remove unreferenced functions?

    I've had this problem with gcc for a while, with C++ code. I was writing some embedded code, and I wanted to use some simple C++. Just by adding a #include of one of the stream libraries. the executable grew by 200k, even though none of it was referenced. The C++ code in iostream is template-generated anyway, so even if the compiler wanted to include the code, it can't until I instantiate it.

    <iostream> includes references to global stream objects like std::cout, not just interface definitions, so including it's going to have larger ramifications that something like <fstream>, which just defines interfaces (and indeed, for me, including <fstream> seems to have no effect on program size, whereas including <iostream> adds about 300 bytes to a simple executable).

  • Not a C program (Score:5, Informative)

    by erroneus (253617) on Tuesday March 16, 2010 @11:19PM (#31504526) Homepage

    I wasted too much time reading this one... nothing surprising about what I found in it. Step one, don't write it in C. Step two, stop linking to things that aren't needed. Step three, perform the functions contained in the library omitted manually. Step five, start cheating in the elf binary format.

    The only thing interesting about it was that the article pointed out an interesting fact -- Linux will run inappropriately formatted binaries. BAD. Linux kernel people? Are you reading this? Fix it before someone figures out how to use this in making and executing more exploits.

  • Damn kids (Score:3, Informative)

    by ucblockhead (63650) on Tuesday March 16, 2010 @11:21PM (#31504546) Homepage Journal

    Back in the DOS days, any moderately competent programmer knew how to copy arbitrary data to screen buffer, allowing you to display text without any libraries. It's been many years, so I am probably getting this wrong, but in psuedocode it'd look something like


    char*cp="Hello World";
    char *addr=0xB8000000;
    while(*addr++ = *cp++);

    That's the C version, of course. You'd actually do it in assembly. My suspicion is that you could do it in on the order of 20 to 25 bytes, but again, it's been decades since I've done anything like that.

  • Re:11k Is Too Big? (Score:4, Informative)

    by Schraegstrichpunkt (931443) on Tuesday March 16, 2010 @11:32PM (#31504604) Homepage
    These days, costs of development and deployment, not runtime memory usage, are the limiting factors in software development.
  • by ktappe (747125) on Tuesday March 16, 2010 @11:48PM (#31504708)

    November 1999. Slow news day much?

    That would explain why as I followed the exercise along in my terminal, I got this warning:

    "ld: warning: option -s is obsolete and being ignored"

    A decade obsolete as it turns out. I suppose when PC's started measuring their RAM in gigabytes, there was little need to strip executables anymore. Still, the article was a very fun read and took me back to the 80's when I was programming at the byte level myself. See also: One-line programming contests and the like. Much more fun (to me) than today's object-oriented everything.

  • Re:11k Is Too Big? (Score:5, Informative)

    by mirix (1649853) on Wednesday March 17, 2010 @12:45AM (#31504970)

    gcc for an AVR target doesn't make an 11k hello world, though.

    Probably because that's an application where it matters, and a modern PC it doesn't matter at all.

  • by jamrock (863246) on Wednesday March 17, 2010 @12:45AM (#31504974)
    There are three links in the article summary. The first is to the Wikipedia entry for "Hello World"; the second is to an article about writing "Hello World" without libc; the third is to part II of the second, an examination of the ELF format and demonstrates the 45 byte program. The summary headline is rubbish. Whoever wrote it either (a); never read either article, or (b); deliberately sensationalized it by conflating the salient features of both articles, in which case they should be working for the tabloids.
  • Re:BTDT (Score:5, Informative)

    by crossmr (957846) on Wednesday March 17, 2010 @01:40AM (#31505182) Journal

    What a shock this comes from Kdawson. I'm about one more kdawson article away from dumping slashdot. I can't imagine that all the people in the slashdot batcave aren't laughing at this tool.
    I sometimes wonder if he just goes out and gets completely hammered at lunch then comes back and picks a few articles.

  • by tlambert (566799) on Wednesday March 17, 2010 @02:10AM (#31505282)

    Apple defines system call APIs at the top of libc ...no static linking allowed.

    This annoys people who like to link things statically, and those who want to make their own libc equivalents for things like embedded language interpreters and don't want to have to figure out vtables and dynamic linking.

    But it also makes everyone else who likes binary compatibility, and Mac OS X historically getting faster with every release, extremely happy, by allowing the interface between the kernel and libc to be changed, without breaking their applications.

    If you statically link, you can't do that. That's great, if your OS has pretty much no real commercial application base, and you are a technical enough person to "just recompile everything from source", but it's not so good when you are talking about an OS where commercial software is very important to customers. Customers who are either non-technical, or who are technical, but think recompiling something that was working just fine before the OS update is a complete waste of time. Lump me in with these last people: I don't believe in "bit rot", I just believe in lazy engineers not maintaining their code or defining their interfaces properly.

    Yeah, if you want fast LMBench results on a null system call -- which keeps changing its definition so that it can't be gamed, exactly the same way you'd game it if you were a commercial application developer needing higher performance -- static linking seems great. But practically, most modern software is either CPU bound or I/O bound. If it's CPU bound, it spends all its time in user space, not making system calls. If it's I/O bound, it spends all its time waiting for whoever is on the other end of the network to send it more bytes. Either way, null system call performance is, frankly, unimportant to almost every possible application.

    So static linking, and writing your system calls at the trap/sysenter/syscall level (with no way to change them when Intel or another chip vendor introduce a "new! optimized method of making system calls!", as has already happened twice in the past) is generally a pretty useless exercise.

    -- Terry

  • Re:11k Is Too Big? (Score:3, Informative)

    by Anonymous Coward on Wednesday March 17, 2010 @02:39AM (#31505384)
    Hello (World). I am from the future. The code you have provided a link to will not run on Windows 9.
  • Re:BTDT (Score:3, Informative)

    by palegray.net (1195047) <philip.paradis@pa l e gray.net> on Wednesday March 17, 2010 @02:58AM (#31505434) Homepage Journal

    It is just a toy program that returns 42 to the OS, but he gets it down to 45 bytes.

    Since computer science tends to be such a male dominated field, I think it's worth pointing out that the author is a woman [mit.edu].

  • Yes (Score:1, Informative)

    by Anonymous Coward on Wednesday March 17, 2010 @05:27AM (#31506016)
    Later on, I will have been going to be sent but another AC will come back afterward to stop vtcodger (957785) from linking to The Source before you have reason to send me. You don't remember because back then it was only a prophecy but now, in the future, the past has occurred.
  • Re:11k Is Too Big? (Score:2, Informative)

    by WWWWolf (2428) <wwwwolf@iki.fi> on Wednesday March 17, 2010 @06:29AM (#31506344) Homepage

    The whole point was learning ELF structure and why things were they way they were. Didn't you ever wonder why a "hello world" program took over 4000 bytes on a modern computer, when in 1980 a Commodore VIC-20 managed to play games in less than 4K of available memory? This wasn't a waste of time.

    Yeah.

    To put this in perspective: Guess how big the executable header is in 8-bit Commodore machines?

    2 bytes. The absolute start address of the program. The computer opens up the file, reads where it's stored, and starts putting data to the memory from that point onward. Simple enough. Of course, there's none of this "relocation" and "memory protection" rubbish to worry about.

    If you wanted to store program in BASIC RAM, you could write a stub BASIC program that basically just has one code line, 10 SYS<startaddr>, where <startaddr> points to the address past the end of the program in BASIC RAM. In total, this "header" is just a dozen bytes or so in tokenised BASIC. (Don't have the time today to test how small I can make it, but...)

  • exe vs com (Score:1, Informative)

    by Anonymous Coward on Wednesday March 17, 2010 @06:55AM (#31506440)

    There were a discussion at some game dev forum a few years ago, using VS60 (which compiles to win32 exe) some person come up with a ~400 bytes exe file (doing nothing).

    Using DOS, a 14 bytes program is enough:
    MOV AH,09
    MOV DX,0108
    INT 21
    RET
    DEC AX
    DB 65
    DB 6C
    DB 6C
    DB 6F
    DB 24
    -------
    14 bytes.
    With INT 21 (requires DOS). A couple more if only using BIOS/INT10 (no OS), must then print each char as individual characters, cannot print a whole string in 1 call.

  • Re:BTDT (Score:3, Informative)

    by argent (18001) <peter@slashdot.2 ... m ['nga' in gap]> on Wednesday March 17, 2010 @07:50AM (#31506748) Homepage Journal

    I don't think Un*x ever did.

    UNIX used the sticky bit instead. UNIX also supported Split I&D on the 11/70 before M+ came out with Split I&D support on RSX.

    I don't think you're remembering right about overlays on RSX. I know I spent WAY too much time waiting for TKB on RSX because it was sitting there trying to cram stuff into overlays. Also, while later PDP-11s technically could support demand paging the 8k page size in a max of 64k addressible memory made it a marginal technique, and I'm pretty sure it was never used, even by M+.

    UNIX supported shared code, too. We had over 35 users at a time on the Cory Hall 11/70 at Berkeley. It was OK under Version 6, but got painfully slow during finals week after they upgraded to Version 7. The Math/Stat department had an 11/60 and could only handle 8 users under UNIX, almost 20 under RSTS... but nobody wanted to use RSTS. We called it "Really Shitty Time Sharing".

    Some of the guys managed to patch RSTS Basic+ to run under UNIX by taking advantage of the fact that UNIX used the TRAP instruction for system calls and RSTS used the IOT instruction, by cramming an IOT handler into the half a kilobyte DEC had left for the stack at the beginning of the core image to emulate the RSTS calls that Basic+ needed. That way they could upgrade the business school to UNIX and get rid of the last RSTS holdout on campus.

  • by bendelo (737558) * on Wednesday March 17, 2010 @10:07AM (#31507940) Homepage
    How about 22 bytes?

    C:\>debug
    -a 0100
    0D39:0100 MOV AH,09
    0D39:0102 MOV DX,0109
    0D39:0105 INT 21
    0D39:0107 INT 20
    0D39:0109
    -e 0109 'Hello world!$'
    -r cx
    CX 0000
    :16
    -n hello.com
    -w
    Writing 00016 bytes
    -q

    C:\>HELLO.COM
    Hello world!
    C:\>dir hello.com
    26/08/2009 10:48 22 HELLO.COM

Genius is ten percent inspiration and fifty percent capital gains.

Working...