Become a fan of Slashdot on Facebook

Simpler "Hello World" Demonstrated In C 582

Posted by kdawson on Tuesday March 16, 2010 @10:03PM from the non-obfuscated dept.

An anonymous reader writes "Wondering where all that bloat comes from, causing even the classic 'Hello world' to weigh in at 11 KB? An MIT programmer decided to make a Linux C program so simple, she could explain every byte of the assembly. She found that gcc was including libc even when you don't ask for it. The blog shows how to compile a much simpler 'Hello world,' using no libraries at all. This takes me back to the days of programming bare-metal on DOS!"

This discussion has been archived. No new comments can be posted.

Simpler "Hello World" Demonstrated In C

Search 582 Comments Log In/Create an Account

Comments Filter:

Similarly (Score:3, Interesting)

by McBeer ( 714119 ) writes: on Tuesday March 16, 2010 @10:19PM (#31504152) Homepage

Awhile back I read another similar article [phreedom.org]. In the article the smallest PE created is a bit larger (97 bytes), but a little more standards compliant. More interestingly, however, the author crafts a program that downloads and executes another program in only 133 bytes.

Share
twitter facebook
IEFBR14 (Score:5, Interesting)

by kenh ( 9056 ) writes: on Tuesday March 16, 2010 @10:20PM (#31504158) Homepage Journal

Mainframers have been using this most simple of all utilities for decades - literally. The Wikipedia entry on it has a good write-up about this (literal) do-nothing program. It's whole purpose is to provide a mechanisim to to exploit the various functions contained in JCL to create, delete, and otherwise manipulate datasets on mainframes.
The wikipedia entry is here: http://en.wikipedia.org/wiki/IEFBR14 [wikipedia.org]

Share
twitter facebook
Something similar (Score:3, Interesting)

by crow ( 16139 ) writes: on Tuesday March 16, 2010 @10:31PM (#31504250) Homepage Journal

I had a laptop that was really short on memory back in 1996 or so. I liked having the six virtual consoles, but rarely used them, so I wrote a program that would wait for you to press enter, then exec the regular login program. It copied the executable onto the same page as the stack and had no globals, so at run time, it used exactly one page of RAM. I used the same technique as the author here of calling syscalls directly instead of using libc.

Share
twitter facebook
Re:Simpler "Hello World" in C? (Score:4, Interesting)

by MerlynEmrys67 ( 583469 ) writes: on Tuesday March 16, 2010 @10:32PM (#31504256)

I always liked the "Strangest Abuse of the Rules" catagory winner for Hello World
char*_="Hello world.\n";
That is it - the whole program.

Parent Share
twitter facebook
Re:11k Is Too Big? (Score:5, Interesting)

by Simonetta ( 207550 ) writes: on Tuesday March 16, 2010 @10:48PM (#31504336)

"An 11k app is not going to make me, or my computer, say 'Good Bye World'"
It is if your computer is a 38-cent Atmel AVR tiny 10, which only has enough space for 512 12-bit instruction words. This chip is about half the size of a sunflower seed, but is faster, and, in several ways, more powerful, than the original $5000 IBM PC from 1981.
Get away from the idea of Gigahertz desktops and $1000 laptops and join the real computer revolution!
For me, if it costs more that $5, it's not a computer that I take seriously. It's just a 20th-century digital processing appliance.

Parent Share
twitter facebook
Re:Old news is VERY OLD (Score:4, Interesting)

by Gamma747 ( 1438537 ) writes: on Tuesday March 16, 2010 @11:05PM (#31504446)

It was uploaded to Reddit [reddit.com] 12 hours ago; that's probably why it's just reaching Slashdot now.

Parent Share
twitter facebook
Re:Simpler "Hello World" in C? (Score:4, Interesting)

by onefriedrice ( 1171917 ) writes: on Tuesday March 16, 2010 @11:10PM (#31504472)

Parent said a lot of words...

But missed the point: http://www2.latech.edu/~acm/helloworld/c.html [latech.edu]

"This program is (supposedly) the smallest C program able to print "Hello world.". The compilation itself produces the desired printout and the program need not be actually run."

Parent Share
twitter facebook
Re:C++ is worse (Score:1, Interesting)

by Anonymous Coward writes: on Tuesday March 16, 2010 @11:12PM (#31504482)

Oh that's bullshit. Indeed it's a problem with GCC, but it's GCC's fault, not C++ and not templates. Templates are instantiated at compile time. The compiler is complete free to throw out and not instantiate parts that are not used. Your lack of instantiations are irrelevant. The problem with GCC is when you use any of the STL or the I/O library, it pulls in the entire stream library. It doesn't have to do that, and templates are not the fault. It's the designer of this monolithic monstrosity.

Parent Share
twitter facebook
Re:IEFBR14 (Score:3, Interesting)

by craighansen ( 744648 ) writes: on Tuesday March 16, 2010 @11:13PM (#31504488) Journal

The original bug in IEFBR14 was that it didn't set the exit code to zero. Fixing that bug doubled the size of the program (from one instruction to two).

Parent Share
twitter facebook
Still written (mostly) in C. . . (Score:3, Interesting)

by JSBiff ( 87824 ) writes: on Tuesday March 16, 2010 @11:23PM (#31504556) Journal

"At the end, the code was assembler"
But, the key point is that the user didn't generate that assembly. The user wrote a C program (granted, the program doesn't actually do any output - it just stores a string in memory, then exits). The user called the compiler to compile the program. The user then *disassembled* the object code which was created by *the compiler*. So, the assembly you see was generated (indirectly, via the objdump command), by the C compiler.
Exception: the user did create a small assembly file with the place-holder _start function. Perhaps, this example would have benefitted by the user defining the _start() function in C also, and using the compiler to compile that - not sure if that actually would have worked or not, but would have been interesting if she had tried.
One other point I'd like to make - ultimately, every C program has to have some assembly *somewhere*. When you call the printf(), printf itself either must use some assembly to interact with the operating system (in order to cause output to be sent to stdout), or printf *might* punt that off to another function, which then has some assembly inside it. The only reason you can do *any* input or output in C (or any other language for that matter) is that, at some point, somewhere, either in the compiler itself, or in a standard library, someone has provided the necessary assembly code for you.
In the case of C, the language designers decided to make the C-language pure 'logic', without any notion of input or output statements, or operating system interactioni, and do all input/output/system calls via library functions (whether you use the standard library, or a 'third-party' library [ I use the term third-party loosely here, because the 'third-party' lib might actually be provided by your compiler vendor, but it's just not the standard library]).

Parent Share
twitter facebook
Re:11k Is Too Big? (Score:5, Interesting)

by walshy007 ( 906710 ) writes: on Tuesday March 16, 2010 @11:54PM (#31504734)

#hello world tiny program .equ SYSCALL, 0x80 .equ SYS_EXIT, 1 .equ SYS_WRITE, 4 .equ STDOUT, 1 .section .data hello: .ascii "hello world!\n" .section .text .globl _start _start: movb $SYS_WRITE, %al #put write syscall in eax movb $STDOUT, %bl #set stream to stdout movl $hello, %ecx #give address of start of buffer to print movb $13, %dl #how many characters of buffer to print int $SYSCALL movb $SYS_EXIT, %al int $SYSCALL The above is a tiny hello world program i wrote myself, it's worth noting that even the resulting binary is larger than it needs to be, I wound up with a 133 byte binary by moving the text string into the ELF header via hex editor, and changing the instruction data to point to the new addresses. Kind of hard to get it smaller than that while keeping it in ELF format, considering the actual object code in the binary was something like 15 bytes with the data illegally in the header.

Parent Share
twitter facebook
29 bytes ! Beat that !!! (Score:5, Interesting)

by Anonymous Coward writes: on Wednesday March 17, 2010 @12:09AM (#31504818)

c:\ xxx>debug
-a
mov dx, 100
mov cx, 000D
mov bx, 1
mov ah, 40
int 21
mov ah, 4C
int 21
-f 111 "Hello World!"
-a100
mov dx, 0111
-r cx :001D
-n c:\ xxx\ hello.com
-w
-q
c:\ xxx>hello.com
Hello World!
c:\ xxx>dir hello.com
03/18/2011 11:29 AM 29 HELLO.COM

Parent Share
twitter facebook
Did similar back in MS-DOS 2.11 (Score:5, Interesting)

by Brett Johnson ( 649584 ) writes: on Wednesday March 17, 2010 @01:06AM (#31505056)

Back in the early 1980s, I was doing development on MS-DOS 2.11 - the first real working version of MS-DOS that resembled Xenix more than CP/M.
I was using a combination of Lattice C and assembly language to do my day job. But I was upset about the libc bloat that Lattice C would drag into the program. Over the Christmas break, I sat down and wrote a tiny version of libc, with the 60% of the calls I actually used. Most of them were either thin wrappers on top of MS-DOS Int21 calls, assembly language implementations (the string functions), or reduced functionality (printf didn't handle strange alignments, floats or doubles), and custom startup/exit code. I also structured the library so that the linker would only link in functions that were actually used. For simple executables, I saw the on-disk file size drop from 10KB-20KB down to 400-600 bytes. Another thing that reduced on-disk file size was to create .com programs, rather than .exe programs.
I was also writing the handful of unix commands that I couldn't do without (ls, cat, cut, paste, grep, fgrep, etc). Since I was implementing dozens of Unix commands, each statically linked to libc, it was very important to reduce the over-all size of each executable. Most of the smaller trivial commands were less than 1KB in size. I think the largest was 4KB. I also had an emacs clone* that was 36KB when compiled and linked against my tiny lib.
For the longest time, I carried around a bootable MS-DOS 2.11 floppy, with my dozens of Unix commands, an emacs-like editor, Lattice C compiler, tiny libc, and some core MS-DOS programs. It allowed my to have my entire development environment on a floppy that I could stick in anyone's machine and make it usable.
* We had a source license for Mince, orphaned by Mark of the Unicorn, a tiny emacs-clone that ran on CP/M, MS-DOS, and Unix. We had enhanced it significantly.

Share
twitter facebook
Re:Damn kids (Score:3, Interesting)

by mandelbr0t ( 1015855 ) writes: on Wednesday March 17, 2010 @01:08AM (#31505068) Journal

mov bx, 0xB000 mov es, bx xor di, di mov si, OFFSET msg mov cx, LEN stosb .data msg db 'Hello, World!', 13, 10, $ LEN equ 15

I wasn't blessed with a color card. And I'm sure that's not actually any real dialect of assembly, but you get the picture.

Parent Share
twitter facebook
Re:11k Is Too Big? (Score:4, Interesting)

by santax ( 1541065 ) writes: on Wednesday March 17, 2010 @01:45AM (#31505196)

Hmmz, I was hoping my post was without any judgement about what is 'better' and more '133t' coding. Sorry you think otherwise :(

Parent Share
twitter facebook
Re:BTDT (Score:5, Interesting)

by h4rr4r ( 612664 ) writes: on Wednesday March 17, 2010 @01:57AM (#31505236)

Some people like their code to run on OSes for grownups.

Parent Share
twitter facebook
Mine is simpler (Score:2, Interesting)

by assert(0) ( 913801 ) writes: on Wednesday March 17, 2010 @02:38AM (#31505380) Homepage

/* -*- coding: utf-8-unix -*- */ #include <stdio.h> int main(int O, char **o) { int l4, l0, l, I, lO[]= { 444,131131,13031,12721,17871,20202,1111, 20102,18781,666,85558,66066,2222,0 }; for(l4=0;l4<14;++l4){ for((l=l0=lO[l4])&&(l0=-7); l>4&&(I=2-((l|l>>O)&O));l=l&O?l+(l<<O)+O:l>>I,l0+=I);{ putchar(10+l0); } } return 0; }

Share
twitter facebook
Re:BTDT (Score:4, Interesting)

by Anne Thwacks ( 531696 ) writes: on Wednesday March 17, 2010 @04:49AM (#31505828)

Not permanently running processes, libraries do not exist as an independent process, but are used by other processes. Regardless of the number of processes executing the code, only a single copy is ever loaded in memory - the entry points are made available via a table, and if anyone loads a copy, everyone has access to that copy (obviously with their own memory for variables). This is easy, because code and data sit in separate memory spaces logically, even though they don't do so physically. (And the code pages have the execute bit set, data doesn't - yes 20 years before Windows had this feature!)
"Single instance only" applied to all code - applications, libraries and OS. We often used to have 16 users on an 11/70 with 1MW of memory (ie 2MB) - all running the same program, so only one copy was resident. (or one of us was running the C or Fortran compilers :-)
isn't that how the BSDs do it today?
In RSX/11, programs could be "installed" so that they made their location on disk known to the OS, so when you ran a program, it was not necessary to search the file system for it. The location on disk, and offset to the entry point, was already known. A program could have multiple entry points (like grep, egrep etc), and libraries just used this mechanism. The dynamic linker stored the program (library) name and index into a table of entry points. I think early implementations statically linked the code to hack this stuff, before it became part of the OS.
If the search of the program you asked for found nothing, then the directories were searched. It was laziness of the users that meant the default strategy was used for most applications. I think VMS continued to support the install option, but I cant remember. I don't think Un*x ever did.
Programs only loaded the pages that were in use, and pages not in use were eventually swapped out. So huge programs did not take for ever to load - you loaded the first page, and jumped to it - then loaded which ever pages execution went to - so you did not need to spend years designing overlay strategies! This was possible because pages could load anywhere - the PDP11 supported "position independent code" (All modern 16 and 32 bit processors could still do this).
Don't you young people know anything? I know this, and I didn't even do computer science in college!
Get off my lawn.

Parent Share
twitter facebook
Re:BTDT (Score:3, Interesting)

by ta bu shi da yu ( 687699 ) writes: on Wednesday March 17, 2010 @05:01AM (#31505870) Homepage

You know that this is a repost of a 2002 slashdot article [slashdot.org]?

Parent Share
twitter facebook
Re:BTDT (Score:4, Interesting)

by MichaelSmith ( 789609 ) writes: on Wednesday March 17, 2010 @05:27AM (#31506014) Homepage Journal

I think VMS continued to support the install option, but I cant remember. I don't think Un*x ever did.
IIRC install in VMS was to register a privileged library with the OS. A library like that could do stuff the calling process couldn't do. Most likely it was install image.obj/priv=sysprv,setprv and so on.
My experience with RSX was with a traffic signal application called SCATS. I once interviewed for a job in a hospital where they supported ~60 users on a single machine, probably an 11/84.
Our SCATS systems had up to 16 DZ11 MUX cards for 128 serial lines. I have never seen a system which could handle that many interrupts and run so cool in the sense that it was always responsive regardless of load and it would chug away for years without showing any signs of stress.
BSD is as close as you would get to that with modern systems. Maybe QNX though I haven't worked with that OS.

Parent Share
twitter facebook
Re:Missing the point (Score:5, Interesting)

by dzfoo ( 772245 ) writes: on Wednesday March 17, 2010 @06:33AM (#31506362)

After reading the linked article, I thought underwhelmed. Then I read the second article referenced in the summary:
http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html [muppetlabs.com]
Now, that was interesting!
The strange thing is that the summary seems to imply that both articles are related, which they are most definitely not. The first one seems to be written by a naive noob, who just discovered a nifty trick in gcc. The second one is written by a real Wizard, who shows you how to conjure up some arcane magic to make ELF your bitch.
-dZ.

Parent Share
twitter facebook
Re:11k Is Too Big? (Score:4, Interesting)

by SharpFang ( 651121 ) writes: on Wednesday March 17, 2010 @07:01AM (#31506458) Homepage Journal

...besides, high-level programmers often underestimate just how big a sector embedded programming is. The $IDIOTS_PET_LANGUAGE is for a PC. Now get me more RAM and better CPU for all the devices running embedded software, that are in my sight range as I look around:
- my cell phone.
- 6 different monitors (OSD doesn't happen magically. Something remembers the settings...)
- a videoserver
- 2 cheap switches
- a regulated power supply
- a heat-controlled soldering iron
- a regular phone
- 3 PC keyboards (hey, neither PS2 nor USB protocols happen by themselves)
- 3 computer mice (optical, meaning pretty advanced image analysis)
- my hand watch
- a battery charger
- a USB hub
- a security motion sensor
- an MP3 player
- a webcam
- a multimeter
- a car alarm remote
- a pendrive.
These all were programmed either in VHDL, Asembler, or C. The phone has some J2ME code too. Think of upgrading each of these devices so much that its firmware could be rewritten in, say, Perl. Or C#.
Also, think about how much embedded programming is in every PC. Each device controller has its own firmware... my bet is any average house contains more embedded programs (in embedded devices) than PC applications on the "family PC" and stored on media.
High-level programming languages are nice and have their place, but considering embedded "a niche not worthy of attention" is a bad mistake. The proportions between amounts of server:desktop:embedded software are much closer to 1:1:1 than most "high-level" programmers are willing to admit.

Parent Share
twitter facebook
Re:BTDT (Score:4, Interesting)

by argent ( 18001 ) writes: <peter AT slashdo ... taronga DOT com> on Wednesday March 17, 2010 @07:36AM (#31506646) Homepage Journal

Was that in RSX11-M?
Version 6 UNIX. I didn't abuse a.out as badly as this example abuses elf, though.
Really, with a.out, it wasn't abuse. That format LIKED these kinds of games. When I was hacking in Forth I wrote a "snapshot" word that did something like
: snapshot fork dup 0= if drop abort then waitpid -1 = if 0 else " mv core snap.out; patch-to-executable snap.out" system 1 then ;

Where patch-to-executable took advantage of the fact that an a.out was basically a core dump with some extra segments... and you could leave those segments off if you needed to. :)
I did some nasty stuff on RSX-11, though. Portable file and terminal I/O was a pain in the butt, because text files were variant record files with each line having a count and a length and a record/line type field and IIRC occasional block alignment issues, so to read and write text files in Forth I had a FORTRAN main that called Forth through an assembly glue routine, then called back to FORTRAN for textfile I/O. Also got tired of FORTRAN formatted I/O so I wrote a version of sprintf for my RATFOR code that used assembly glue to implement varargs in FORTRAN. Ah, the good old days...

Parent Share
twitter facebook
Re:Did similar back in MS-DOS 2.11 (Score:3, Interesting)

by AceJohnny ( 253840 ) writes: on Wednesday March 17, 2010 @08:08AM (#31506846) Journal

I've always wondered: what was the difference, in DOS, between a .com and a .exe?

Parent Share
twitter facebook
How about 28 bytes?! (Score:1, Interesting)

by Anonymous Coward writes: on Wednesday March 17, 2010 @08:51AM (#31507152)

c:\ xxx>debug
-a
mov dx, 100
mov cx, 000D
mov bx, 1
mov ah, 40
int 21
mov ah, 4C
int 21
-f 111 "Hello World"
-a100
mov dx, 0111
-r cx :001C
-n c:\ xxx\ hello.com
-w
-q
c:\ xxx>hello.com
Hello World
c:\ xxx>dir hello.com
03/18/2011 11:29 AM 28 HELLO.COM
What do I win?

Parent Share
twitter facebook
It won't remove them unless you tell it to (Score:3, Interesting)

by Chemisor ( 97276 ) writes: on Wednesday March 17, 2010 @09:25AM (#31507466)

You have to explicitly enable function-level linking with gcc. Compile your source files with -ffunction-sections -fdata-sections, and then pass -gc-sections flag to ld (-Wl,-gc-sections if linking with gcc). This puts every function into its own .text.section and allows the linker to prune the ones that are not referenced. The remaining ones are coalesced into a single .text section.

Parent Share
twitter facebook
Devices (Score:2, Interesting)

by Anonymous Coward writes: on Wednesday March 17, 2010 @09:52AM (#31507746)

Here we still program in C (I don't, but others here do) because if we wrote in something like .NET, Ruby or Python the executable would be so much bigger and the binaries wouldn't fit on the device hardware. So sometimes having something 'closer to the iron' is better.
Then again, when telling a device how to servo, string interpretation is seldom of high concern.

Parent Share
twitter facebook
Re:11k Is Too Big? (Score:1, Interesting)

by Anonymous Coward writes: on Wednesday March 17, 2010 @10:28AM (#31508216)

Actually, embedded software probably makes up the VAST majority of the entire software market. Microcontrollers are everywhere, in your TV, your microwave, dozens in your car, your watch, your phone, your electric razor, everywhere.

Parent Share
twitter facebook
Re:BTDT (Score:3, Interesting)

by tixxit ( 1107127 ) writes: on Wednesday March 17, 2010 @11:19AM (#31508952)

Not the GP, but I learned this fact when learning about assembly and trying to figure out why we used _start and not main. I also learned all about the C convention of prepending functions with underscores and a lot of other jazz. I even did some of the type of stuff the author did (abusing the ELF file format to shrink a simple program's file size). However, a lot of folk don't learn asm, so I think this article would be pretty cool. It is also significantly better written and more cohesive than most of the crap I read.

Parent Share
twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Simpler "Hello World" Demonstrated In C 582

Simpler "Hello World" Demonstrated In C More Login

Simpler "Hello World" Demonstrated In C

Similarly (Score:3, Interesting)

IEFBR14 (Score:5, Interesting)

Something similar (Score:3, Interesting)

Re:Simpler "Hello World" in C? (Score:4, Interesting)

Re:11k Is Too Big? (Score:5, Interesting)

Re:Old news is VERY OLD (Score:4, Interesting)

Re:Simpler "Hello World" in C? (Score:4, Interesting)

Re:C++ is worse (Score:1, Interesting)

Re:IEFBR14 (Score:3, Interesting)

Still written (mostly) in C. . . (Score:3, Interesting)

Re:11k Is Too Big? (Score:5, Interesting)

29 bytes ! Beat that !!! (Score:5, Interesting)

Did similar back in MS-DOS 2.11 (Score:5, Interesting)

Re:Damn kids (Score:3, Interesting)

Re:11k Is Too Big? (Score:4, Interesting)

Re:BTDT (Score:5, Interesting)

Mine is simpler (Score:2, Interesting)

Re:BTDT (Score:4, Interesting)

Re:BTDT (Score:3, Interesting)

Re:BTDT (Score:4, Interesting)

Re:Missing the point (Score:5, Interesting)

Re:11k Is Too Big? (Score:4, Interesting)

Re:BTDT (Score:4, Interesting)

Re:Did similar back in MS-DOS 2.11 (Score:3, Interesting)

How about 28 bytes?! (Score:1, Interesting)

It won't remove them unless you tell it to (Score:3, Interesting)

Devices (Score:2, Interesting)

Re:11k Is Too Big? (Score:1, Interesting)

Re:BTDT (Score:3, Interesting)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot