Forgot your password?
typodupeerror
Programming Linux

Simpler "Hello World" Demonstrated In C 582

Posted by kdawson
from the non-obfuscated dept.
An anonymous reader writes "Wondering where all that bloat comes from, causing even the classic 'Hello world' to weigh in at 11 KB? An MIT programmer decided to make a Linux C program so simple, she could explain every byte of the assembly. She found that gcc was including libc even when you don't ask for it. The blog shows how to compile a much simpler 'Hello world,' using no libraries at all. This takes me back to the days of programming bare-metal on DOS!"
This discussion has been archived. No new comments can be posted.

Simpler "Hello World" Demonstrated In C

Comments Filter:
  • BTDT (Score:3, Insightful)

    by argent (18001) <peter.slashdot@2006@taronga@com> on Tuesday March 16, 2010 @10:05PM (#31504044) Homepage Journal

    *sigh*

    Been there done that... on the PDP-11 in 1979.

    • Re:BTDT (Score:5, Insightful)

      by tomhudson (43916) <barbara.hudson@NOSpAM.barbara-hudson.com> on Tuesday March 16, 2010 @11:17PM (#31504516) Journal

      She found that gcc was including libc even when you don't ask for it.

      This is basic knowledge that ANYONE using c should know - that the startup library is linked to so it can find main.

      This is almost as lame as their previous slashvertisement/product_whoring [ksplice.com] - where they claimed to have gotten around the Mythical Man-Month and quadrupled output - and it turned out that neither claim was true.

      And their lame excuse [ksplice.com], which I derided in this comment [ksplice.com]:

      Greg Price wrote:

      "what I hoped to get across in this post is that that's not true--in the right circumstances, adding people to a software project can get a lot done, even in a short time"

      As many people have pointed out, you did NOT add people to a software project. You created a dozen small, one-person projects. Your self-serving reply to all that is just one more mis-representation. Have you no shame?

      I'm sure we're not the only ones to have used embedded assembler in c programs.

      • Re:BTDT (Score:5, Informative)

        by crossmr (957846) on Wednesday March 17, 2010 @01:40AM (#31505182) Journal

        What a shock this comes from Kdawson. I'm about one more kdawson article away from dumping slashdot. I can't imagine that all the people in the slashdot batcave aren't laughing at this tool.
        I sometimes wonder if he just goes out and gets completely hammered at lunch then comes back and picks a few articles.

      • Re:BTDT (Score:5, Insightful)

        by jlehtira (655619) on Wednesday March 17, 2010 @08:22AM (#31506940) Journal

        She found that gcc was including libc even when you don't ask for it.

        This is basic knowledge that ANYONE using c should know - that the startup library is linked to so it can find main.

        Okay, and where am I supposed to learn it from? That was new to me, after using gcc for a very long time.

        I'm actually very happy that someone out there told me something that you think I should just know.

        So it wasn't new to you? Don't read it.

    • Re:BTDT (Score:5, Insightful)

      by kevingolding2001 (590321) on Wednesday March 17, 2010 @05:51AM (#31506148)

      *sigh*

      Been there done that... on the PDP-11 in 1979.

      And did you write up a nice article for other people to learn from what you had done?

      I think the real value here is not that she did this, but that she wrote it up in a nice easy to read way so that you can follow her train of thought and get a feel for how one goes about tinkering with compilers and such.

      This adds value for people like me who are not as smart as you. I could never have done this on a PDP-11 (although I did have access to one back in my days at university). I also previously would not have know enough to do this in Linux. But having read this article I feel I have learnt something and have a new insight into how linkers and libraries work. Who knows, maybe I will be able to do something similar myself after this learning experience, and for that I am grateful to Jessica for doing it, writing about it and (I'm guessing it was her) submitting it to /.

      Now I shall respectfully step off your lawn.

  • by textstring (924171) on Tuesday March 16, 2010 @10:07PM (#31504068)

    Interesting, but she does sort of sidestep the whole 'Hello World!' part of a hello world program.

    • by Anonymous Coward on Wednesday March 17, 2010 @12:09AM (#31504818)

      c:\ xxx>debug
      -a
      mov dx, 100
      mov cx, 000D
      mov bx, 1
      mov ah, 40
      int 21
      mov ah, 4C
      int 21
      -f 111 "Hello World!"
      -a100
      mov dx, 0111
      -r cx :001D
      -n c:\ xxx\ hello.com
      -w
      -q

      c:\ xxx>hello.com
      Hello World!

      c:\ xxx>dir hello.com
      03/18/2011 11:29 AM 29 HELLO.COM

    • by cthugha (185672) on Wednesday March 17, 2010 @01:02AM (#31505038)

      Since the output is the Answer to the Ultimate Question, it necessarily incorporates or encodes every possible output of every possible program, including the string "Hello World!".

      The method for extracting the particular output desired is left as an exercise for the reader.

  • Nice but? (Score:5, Insightful)

    by garcia (6573) on Tuesday March 16, 2010 @10:11PM (#31504106) Homepage

    Ok, this is wicked great in theory. Our programs have become bloated. We do have them taking up too much RAM, HD space, and CPU time. But after reading through this in-depth analysis I have to wonder if it's all worth it.

    If we're willing to leave behind all pretenses of portability, we can make our program exit without having to link with anything else. First, though, we need to know how to make a system call under Linux.

    Or I can just write it the old way, making the file size larger and not have to concern myself with portability and how to make system calls under Linux. After all that's what the whole point of this all was right?

    • Re:Nice but? (Score:5, Insightful)

      by dido (9125) <dido@im[ ]ium.ph ['per' in gap]> on Tuesday March 16, 2010 @11:33PM (#31504608)

      Which is missing the point. Haven't you ever wondered what's really in that 11k of machine code, and what it actually does? We've gotten so insulated from the lower levels of our computers that we no longer really understand how they do something so basic as terminating their own execution. The article felt more to me like an expository attempt to shed light on some of the things that libc has to do for us, rather than practical advice on attempting to make our programs smaller.

    • Re:Nice but? (Score:4, Insightful)

      by fermion (181285) on Wednesday March 17, 2010 @12:40AM (#31504956) Homepage Journal
      I disagree that programs are bloated. In most cases, we code to deliver a product at a reasonable cost. Competent trained humans are much more expensive than gates. This is why few people code in C. They want fancy features like trash collection, signaling, and GUI. While all of these can be custom coded on a case by case basis so that only the features needed are included, and the libraries are optimized. Of course competent programmers do not need trash collection, but it sure makes life easier, and can cut down on programming hours. So we tolerate a bit of inefficiency because, frankly, very few people are going to pay double the price so they can use a 500 MHz 256B computers. The average person is more likely going to pay $400 for a 2 GHZ @GB machine, and then want the software for little or no money.

      Now, that is not to say the libraries should not be optimized. It makes economic sense to spend significant time on such code. Just look at MS Vista. But complaining that we unnecessary library code is sometimes included does not really solve any problems.

    • Re:Nice but? (Score:4, Insightful)

      by Sycraft-fu (314770) on Wednesday March 17, 2010 @12:54AM (#31505002)

      Well that's part of the REASON that programs have become "bloated." We have plenty of resources these days. RAM and HDD space is cheap. So, it doesn't make sense to spend time trying to wring every byte out of a program. If having a bit of bloat makes the program more portable, or easier to debug, or more resilient to attack or whatever it is probably worth while.

      I'd much rather have a program that was 1MB larger than it needs to be, but easy for the devs to maintain and nice and compatible than one that is as small as possible but is a complete mess at the code level. As a practical matter program data, like graphics, sounds, media, etc, is way, WAY bigger than the program itself. For example Mass Effect 2 has about 25.6MB of code between its binary and various DLLs. If you count system DDLs it uses, it is maybe up to 50MB. It's total size? 12.1GB. All the rest is data of various kinds. They could halve the size of the code and still not make even a tiny dent in disk or memory usage.

  • by putaro (235078) on Tuesday March 16, 2010 @10:13PM (#31504116) Journal

    45 bytes, huh? I can do it in....

    #!/bin/sh
    exit 42

    18 bytes and it's portable across all Unices. Maybe the assembler version is faster, though?

  • Umm, but (Score:5, Insightful)

    by Psychotria (953670) on Tuesday March 16, 2010 @10:17PM (#31504136)

    Since when does a Hello World program not actually output anything?

  • If it's so simple, (Score:5, Insightful)

    by newcastlejon (1483695) on Tuesday March 16, 2010 @10:18PM (#31504146)
    Why doesn't it fit in TFS?
  • Similarly (Score:3, Interesting)

    by McBeer (714119) on Tuesday March 16, 2010 @10:19PM (#31504152) Homepage
    Awhile back I read another similar article [phreedom.org]. In the article the smallest PE created is a bit larger (97 bytes), but a little more standards compliant. More interestingly, however, the author crafts a program that downloads and executes another program in only 133 bytes.
  • IEFBR14 (Score:5, Interesting)

    by kenh (9056) on Tuesday March 16, 2010 @10:20PM (#31504158) Homepage Journal

    Mainframers have been using this most simple of all utilities for decades - literally. The Wikipedia entry on it has a good write-up about this (literal) do-nothing program. It's whole purpose is to provide a mechanisim to to exploit the various functions contained in JCL to create, delete, and otherwise manipulate datasets on mainframes.

    The wikipedia entry is here: http://en.wikipedia.org/wiki/IEFBR14 [wikipedia.org]

  • by kenh (9056) on Tuesday March 16, 2010 @10:24PM (#31504204) Homepage Journal

    At the end, the code was assembler, and the compiler wasn't even called - just the linker. I can't say for sure where a C program ends and an assembler program begins, but I'm fairly certain that the last few iterations are assembler, based on the "let's do away with the compiler" suggestion.

    Also, "Hello World" programs have to, you know, actually display the message "Hello World" - this is a program that isn't written in C, and doesn't write "Hello World" - care to revisit the title of this entry?

    • by MerlynEmrys67 (583469) on Tuesday March 16, 2010 @10:32PM (#31504256)
      I always liked the "Strangest Abuse of the Rules" catagory winner for Hello World
      char*_="Hello world.\n";

      That is it - the whole program.

      • Re: (Score:3, Informative)

        by refactored (260886)
        Parent said, I always liked the "Strangest Abuse of the Rules" catagory winner for Hello World

        char*_="Hello world.\n";

        That is it - the whole program.

        echo 'char*_="Hello world.\n"; ' > a.c
        $ gcc a.c
        /usr/lib/gcc/i486-linux-gnu/4.4.1/../../../../lib/crt1.o: In function `_start':
        /build/buildd/eglibc-2.10.1/csu/../sysdeps/i386/elf/start.S:115: undefined reference to `main'
        collect2: ld returned 1 exit status

        Doesn't say "Hello" to me!

    • "At the end, the code was assembler"

      But, the key point is that the user didn't generate that assembly. The user wrote a C program (granted, the program doesn't actually do any output - it just stores a string in memory, then exits). The user called the compiler to compile the program. The user then *disassembled* the object code which was created by *the compiler*. So, the assembly you see was generated (indirectly, via the objdump command), by the C compiler.

      Exception: the user did create a small assembly

  • Something similar (Score:3, Interesting)

    by crow (16139) on Tuesday March 16, 2010 @10:31PM (#31504250) Homepage Journal

    I had a laptop that was really short on memory back in 1996 or so. I liked having the six virtual consoles, but rarely used them, so I wrote a program that would wait for you to press enter, then exec the regular login program. It copied the executable onto the same page as the stack and had no globals, so at run time, it used exactly one page of RAM. I used the same technique as the author here of calling syscalls directly instead of using libc.

  • by commodoresloat (172735) * on Tuesday March 16, 2010 @10:37PM (#31504284)

    Thank God we have finally crossed this hurdle. The baffling complexity of helloworld.c is no longer an obstacle to world domination.

    I think we can now finally say once and for all that 2010 will be the year of Linux on the desktop.

  • by geekmux (1040042) on Tuesday March 16, 2010 @10:44PM (#31504312)

    OK, when I first read this, I thought to myself, "now why in the hell would anyone care to do this?"

    Then it dawned on me. One stoned programmer said to another....Yeah, that's probably how it went down. Both now, and back in 1979, when you could still smoke in the Data Center...

  • C++ is worse (Score:5, Insightful)

    by MobyDisk (75490) on Tuesday March 16, 2010 @10:48PM (#31504340) Homepage

    Shouldn't the linker remove unreferenced functions?

    I've had this problem with gcc for a while, with C++ code. I was writing some embedded code, and I wanted to use some simple C++. Just by adding a #include of one of the stream libraries. the executable grew by 200k, even though none of it was referenced. The C++ code in iostream is template-generated anyway, so even if the compiler wanted to include the code, it can't until I instantiate it.

    • Re:C++ is worse (Score:5, Informative)

      by macshit (157376) <(gro.ung) (ta) (selim)> on Tuesday March 16, 2010 @11:13PM (#31504490) Homepage

      Shouldn't the linker remove unreferenced functions?

      I've had this problem with gcc for a while, with C++ code. I was writing some embedded code, and I wanted to use some simple C++. Just by adding a #include of one of the stream libraries. the executable grew by 200k, even though none of it was referenced. The C++ code in iostream is template-generated anyway, so even if the compiler wanted to include the code, it can't until I instantiate it.

      <iostream> includes references to global stream objects like std::cout, not just interface definitions, so including it's going to have larger ramifications that something like <fstream>, which just defines interfaces (and indeed, for me, including <fstream> seems to have no effect on program size, whereas including <iostream> adds about 300 bytes to a simple executable).

    • Re: (Score:3, Insightful)

      by ShakaUVM (157947)

      Shouldn't the linker remove unreferenced functions?

      I've had this problem with gcc for a while, with C++ code. I was writing some embedded code, and I wanted to use some simple C++. Just by adding a #include of one of the stream libraries. the executable grew by 200k, even though none of it was referenced. The C++ code in iostream is template-generated anyway, so even if the compiler wanted to include the code, it can't until I instantiate it.

      There's utilities you can run to pull unused object code out of yo

  • Not a C program (Score:5, Informative)

    by erroneus (253617) on Tuesday March 16, 2010 @11:19PM (#31504526) Homepage

    I wasted too much time reading this one... nothing surprising about what I found in it. Step one, don't write it in C. Step two, stop linking to things that aren't needed. Step three, perform the functions contained in the library omitted manually. Step five, start cheating in the elf binary format.

    The only thing interesting about it was that the article pointed out an interesting fact -- Linux will run inappropriately formatted binaries. BAD. Linux kernel people? Are you reading this? Fix it before someone figures out how to use this in making and executing more exploits.

  • Damn kids (Score:3, Informative)

    by ucblockhead (63650) on Tuesday March 16, 2010 @11:21PM (#31504546) Homepage Journal

    Back in the DOS days, any moderately competent programmer knew how to copy arbitrary data to screen buffer, allowing you to display text without any libraries. It's been many years, so I am probably getting this wrong, but in psuedocode it'd look something like


    char*cp="Hello World";
    char *addr=0xB8000000;
    while(*addr++ = *cp++);

    That's the C version, of course. You'd actually do it in assembly. My suspicion is that you could do it in on the order of 20 to 25 bytes, but again, it's been decades since I've done anything like that.

    • Re: (Score:3, Interesting)

      by mandelbr0t (1015855)

      mov bx, 0xB000
      mov es, bx
      xor di, di
      mov si, OFFSET msg
      mov cx, LEN
      stosb

      .data
      msg db 'Hello, World!', 13, 10, $
      LEN equ 15

      I wasn't blessed with a color card. And I'm sure that's not actually any real dialect of assembly, but you get the picture.

  • Bare metal DOS? (Score:4, Insightful)

    by blueg3 (192743) on Tuesday March 16, 2010 @11:36PM (#31504628)

    If you're actually programming on "bare metal", you're not really using DOS, are you? After all, DOS is an operating system -- a layer between your code and the hardware.

  • by Brett Johnson (649584) on Wednesday March 17, 2010 @01:06AM (#31505056)

    Back in the early 1980s, I was doing development on MS-DOS 2.11 - the first real working version of MS-DOS that resembled Xenix more than CP/M.

    I was using a combination of Lattice C and assembly language to do my day job. But I was upset about the libc bloat that Lattice C would drag into the program. Over the Christmas break, I sat down and wrote a tiny version of libc, with the 60% of the calls I actually used. Most of them were either thin wrappers on top of MS-DOS Int21 calls, assembly language implementations (the string functions), or reduced functionality (printf didn't handle strange alignments, floats or doubles), and custom startup/exit code. I also structured the library so that the linker would only link in functions that were actually used. For simple executables, I saw the on-disk file size drop from 10KB-20KB down to 400-600 bytes. Another thing that reduced on-disk file size was to create .com programs, rather than .exe programs.

    I was also writing the handful of unix commands that I couldn't do without (ls, cat, cut, paste, grep, fgrep, etc). Since I was implementing dozens of Unix commands, each statically linked to libc, it was very important to reduce the over-all size of each executable. Most of the smaller trivial commands were less than 1KB in size. I think the largest was 4KB. I also had an emacs clone* that was 36KB when compiled and linked against my tiny lib.

    For the longest time, I carried around a bootable MS-DOS 2.11 floppy, with my dozens of Unix commands, an emacs-like editor, Lattice C compiler, tiny libc, and some core MS-DOS programs. It allowed my to have my entire development environment on a floppy that I could stick in anyone's machine and make it usable.

    * We had a source license for Mince, orphaned by Mark of the Unicorn, a tiny emacs-clone that ran on CP/M, MS-DOS, and Unix. We had enhanced it significantly.

    • Re: (Score:3, Interesting)

      by AceJohnny (253840)

      I've always wondered: what was the difference, in DOS, between a .com and a .exe?

"Pok pok pok, P'kok!" -- Superchicken

Working...