Effect of Using 64-bit Pointers? 164
An anonymous reader queries: "Most 64-bit processors provide a 32-bit mode for compatibility, but 64-bit pointers are becoming essential as systems move beyond 4GB of RAM. Also, the large virtual address space is very useful for several reasons - allowing large files to be memory-mapped, and allowing pages of memory to be remapped without ever requiring the virtual address space to be defragmented. However, 64-bit pointers take up twice as much memory, which immediately affects memory footprint. This is especially an issue on embedded platforms where RAM is at a premium, but even on systems where RAM is plentiful and cheap the extra memory footprint reduces cache performance. Have Slashdot readers done any research into the actual effect of using 64-bit pointers in a 'typical' application? What proportion of a real program's data is actually pointers?"
easy... (Score:5, Funny)
none whatsoever.
What proportion of a real program's data is actually pointers?
none whatsoever.
oh... i use java.
Re:easy... (Score:5, Funny)
Re:easy... (Score:5, Insightful)
however, can you think of any system where you had objects, sets of data, and they weren't (at least underneath) pointers to memory?
and as to the original subject one poster already said it best: if you really have the need for that extra effort of going 64bit pointers you will probably have the memory to spare, no? anyways it will only be a problem if if the the pointers are big enough in comparision to what they're pointing to.. in which case you should rethink what you're doing anyways probably if you care a squat about memory footprint.. bringing the embedded devices to the discussion at this point is totally pointless but of course cool sounding and slashdot editor catchy.
bleh I'm no expert anyways.
Re:easy... (Score:2)
The second part I agree with..
Etta nain.
Re:easy... (Score:2)
No, objects and references are completely different things.
Objects can vary in size, always live on the heap, and are always instances of a concrete (non-abstract) class.
References always have the same size, can live on the heap or on the stack, and can have any type (class or interface).
References are the things that point to objects. Every time you deal with an object you do it by way of a reference to the object. But that doesn't mean that objects and references are t
Re:easy... (Score:2)
Strings are necessarily at most 4GB is length. This is part of the definition of the language. Therefore there are at least *some* objects which are limited to a 4GB size.
Also integers are 32bits exactly in Java, so all arrays are necessarily limited to having (about) 4 billion entries. Though, of course, each entry may be more more than one byte i
Re:easy... (Score:2)
Are you sure the total (byte-)size of an array can exceed 4Gb? As I recall, both the reference and returnAddress types are Category 1 (32-bit) in the VM specification, which implies 4Gb is the maximum size of both data and bytecodes.
Re:easy... (Score:5, Informative)
I run very large simulations on various platforms, and some of my simulations have to be run on a 64-bit machine because of the memory requirements. Sun's Java forums have several posts asking for various maximum heap (maximum memory accessable) for various platforms and you can find more exact numbers for specific platforms and operating systems there.
An object is an object, not a pointer. However, objects are accessed through a reference, which in implementation, is typically a pointer.
Re:easy... (Score:2)
Re:easy... (Score:2)
Re:easy... (Score:2)
Re:easy... (Score:5, Insightful)
If you'd pay proper attention to Sun's marketing machine, you'd remember that Java uses a just-in-time compiler. What does a compiler do? It turns all of your "object-oriented is the only valid programming paradigm" source code into a big bucket of CPU-specific opcodes, numbers and *pointers*.
In fact, it will probably have more pointers than the corresponding C or C++ program would have, due to the plethora of tiny objects you're encouraged to spawn. Naturally, the pointer size would match the CPU architecture on which the program is being run and would consume a corresponding number of cache bytes.
Re:easy... (Score:3, Informative)
When you create an object in Java, you are, in a sense, creating a pointer. As a matter of fact it's easy to make a linked list or a binary tree with Java, the same way you do in C. Just because it's not explicitly called a pointer doesn't mean it isn't used.
Ever heard of a NullPointerException [sun.com]?
"Java doesn't have pointers" is a hype phrase still left over from the Dot Bomb era...
Re:easy... (Score:2)
Embedded 64-Bit (Score:4, Insightful)
Is this really a problem in the embedded space?
Re:Embedded 64-Bit (Score:2)
Re:Embedded 64-Bit (Score:5, Insightful)
Bandwidth from memory to cache will also be used by these larger pointers.
OTOH, other than disk controller caches (?), what kind of embedded systems need more than 4GB online simultaneously ?
Re:Embedded 64-Bit (Score:5, Informative)
There's a lot of modern medical equipment which can definitely use the 4GB. MRI machines, CT scanners, ultrasound machines ("sonographs" if you prefer the term) and so on do tend to chew up memory. Particularly the first two, because you often need to hold whole voxel sets in memory while you compute a bunch of cross-sections at odd angles.
Re:Embedded 64-Bit (Score:2, Interesting)
Some CAD Programms used in Mechanical Engineering (CATIA V5 for example) could use that much. Loading a whole car engine into one of these Programms will exceed 4 GB pretty quickly.
Re:Embedded 64-Bit (Score:4, Interesting)
So while you may have a caching problem, I think it's going to be because of accessing more data rather than the 4 bytes extra on some pointers.
Now if you're using disk based data structures, you better be using 64bit. I could make an exception if you used a 32bit number to address the cluster, then a 16bit number to access the actual data in the cluster, if required. A good DB server would do well to use 32bit cluster numbers to save index size, then scan the loaded cluster for the record. AFAIK, no one has been clever enough to do this, but I'm not privy to the internal structures of a lot of DBMSs. And this would matter a lot, because you could fit much more of the index into memory, and have much less data to read on the drive. Throwing away CPU cycles and memory for more compact disk data is a common practice.
Re:Embedded 64-Bit (Score:5, Insightful)
Embedded devices come in all sorts of varieties from 4-bit to 64-bit, and will do for the foreseeable future. When you're producing X million chips, the software is amortised to basically nothing and the hardware cost becomes the primary concern, so there is no chance that lower-spec chips will ever go away in the future.
So you're not going to be forced to use a 64-bit chip in your design, just because the chip company has stopped selling the lower spec ones. In the PC business this does happen, because there's no demand for older, lower-spec chips. In the embedded market though, the demand is there and will continue to be there, so the situation has not and will not arise.
If your target application needs 64-bit processing, you choose a device that does 64-bit processing, and you choose RAM size to suit. If you don't need it, you don't choose it. Simple.
Someone elsewhere had some questions about internal registers/internal RAM. Well as with all processors, some give you enough registers and some don't. Again, the engineer just has to pick the processor that gives the capabilities they want.
Grab.
Embedded platforms?!? (Score:4, Interesting)
Re:Embedded platforms?!? (Score:2)
Or maybe what I'm used to calling an "embedded" device isn't the same as the submitter's...
=Smidge=
Re:Embedded platforms?!? (Score:2, Interesting)
Re:Embedded platforms?!? (Score:2, Insightful)
Re:Embedded platforms?!? (Score:2)
Of course, we won't be going to a 64 bit chip in the near future, if ever.
Re:Embedded platforms?!? (Score:5, Insightful)
By the way, the limit was from physical slots - 8 and a 2GByte DIMM memory limit, increase either of those and guess what.
Now each "process" on our box could only address 4 Gbyte of that memory, but that was a completely different question (and in fact limited by the libraries that were used - again a different story)
I remember these conversations when the 32 bit world came around - what do you mean I have to put 4 bytes into the processor. End result is that the code is a little larger, and a little slower - and Moore's law marches on and we don't even notice
Re:Embedded platforms?!? (Score:4, Interesting)
Re:Embedded platforms?!? (Score:2)
Re:Embedded platforms?!? (Score:5, Informative)
Re:Embedded platforms?!? (Score:2)
Re:Embedded platforms?!? (Score:5, Informative)
If you wish to use memory mapped IO to your file system, which has some good technical properties, you need a pointer with an address range *at least* as large as the largest possible file you might need to access, and preferably as large as the largest file system you intend to mount.
Addressibility and physical storage are somewhat orthogonal. (In theory, there is no difference between theory and practice, in practice there is.)
On a machine with 10G of memory, there is no reason for a process to use 64-bit pointers if the process doesn't require more than 32 bits of addressibility. If you look at Apache in the standard threading model, every request is managed by a different process. I doubt you need 64-bit pointers for *each* PHP instance, regardless of how much physical memory the machine contains.
On the other hand, you might be doing some kind of video stream manipulation on a 10GB file using a machine with only 1GB of physical RAM. You would require the use of 64-bit addressibility for this task if you choose the memory mapped IO model.
So yes, you are retarded, but it could be cured by thinking before you type (the post does mention memory mapped IO). There: ten simple words of advice that should apply to 2^33 members of the slashdot community.
Re:Embedded platforms?!? (Score:2)
Re:Embedded platforms?!? (Score:2)
Re:Embedded platforms?!? (Score:2)
Are you trying to put this site out of business?
Re:Embedded platforms?!? (Score:3, Interesting)
Re:Embedded platforms?!? (Score:3, Informative)
I believe that the 64-bit capable MIPS architecture found it's biggest success in the embedded processor market. From the wikipedia entry [wikipedia.org]:
Re:Embedded platforms?!? (Score:2)
Re:Embedded platforms?!? (Score:2)
That being said, yes, the PS2 has an absolutely beautiful processor setup. Inter-processor bandwidth galore and extremely custom caches (think DMA on steroids). The overhead of doing extra-bit calculations (e.g. SIMD instructions) disappears entirely behind the more optimized architecture.
Re:Embedded platforms?!? (Score:2)
Re:Embedded platforms?!? (Score:2)
Don't use 64-bit pointers on such systems. (Score:5, Insightful)
Huh? On systems where RAM is at a premium, I don't see the point of using or having 64-bit pointers.
Re:Don't use 64-bit pointers on such systems. (Score:5, Informative)
The poster named one point: mapping large files.
Using mmap() for certain kinds of I/O is very, very useful in performance-sensitive applications. Using POSIX I/O (i.e. read(), write() and its relatives) means that your data must go through memory twice: once from disk into the buffer/page cache and then once again into userland. Memory-mapped I/O effectively unifies the two, saving on precious memory and memory bandwidth.
Re:Don't use 64-bit pointers on such systems. (Score:2)
The same considerations need to be applied to mmap a file, so there should be is no difference.
In other words, read() and write() with page-alignment constraints should be the same as mmap. The difference is that re-using the same buffer may require an unmap.
With
Re:Don't use 64-bit pointers on such systems. (Score:2)
Because the app. doesn't share the data with the OS so if the app. alters the data the OS needs to have setup COW so the data it sees is the same. And it is very rare for applications to use page aligned buffers to read or write, it is also very common to chang
Seriously OT by now... (Score:2)
Still, the easiest way to handle this is to always mmap() files, and read/write will either (a) be replaced b
Trade-offs (Score:4, Interesting)
Re:Trade-offs (Score:2)
Re:Trade-offs (Score:2)
2) If the 32-bit register is also the low 32-bits of the 64-bit register, than it's just 2 loads, not 4 instructions... granted, that's twice as many loads as using 64-bit pointers, but you have a greater chance of all data fitting into cache, so it might actually be faster.
Latency (Score:3, Interesting)
Re:Latency (Score:3, Insightful)
Re:The compact memory model (Score:2)
No-one:
Re:The compact memory model (Score:2)
64 bit embedded processors? (Score:3, Insightful)
Re:64 bit embedded processors? (Score:2)
When you think "embedded", you're probably thinking of a smartcard, a pacemaker, a digital television or a fax machine. It may interest you to know that an MRI scanner is also an embedded system.
Implications of 64 bit pointers for interpreters (Score:5, Insightful)
If we are already using 64 bits for our pointers, a virtual machine has the potential of exploiting a the pointer's larger footprint for other immediate values. I'm not as crazy about using the MSB of the pointer for indicating an immediate as Ian Bicking appears to be, I'd recommend using the LSB since it's easier to bias any object to an even address than halve the potential addressable space.
Then again, if the potential address space is 2 ** 64, I suppose it's not such a sacrifice.
Re:Implications of 64 bit pointers for interpreter (Score:2)
AMD (you know the guys who made x86-64) are NOT fans of these kinds of ideas. If you scribble in undefined places in the pointer, the Opteron/Athlon64 will throw an exception. Pointers in x86-64 are signed extended, so its not trivial to hide stuff in upper bits and then
Re:Implications of 64 bit pointers for interpreter (Score:2)
Probably not as big a deal as you think. (Score:5, Interesting)
Because of these existing data alignment issues, going from 32-bit to 64-bit pointers may have absolutely no impact on a program's memory usage and cache performance. It is highly likely you're already using 64-bit alignment when you enable the compiler's optmizations.
Unless you're building massive linked lists of stuff in a scientific / simulation environment this is probably something not worth worrying about. The efficiency and volume of your actual data will still be the biggest waste of space - and it's not like you won't be able to attach more physical memory onto your new system than the old one.
If it does effect you... you probably already know what you're doing or you've been making very bad assumptions about the size of your variable types.
Re:Probably not as big a deal as you think. (Score:2)
Re:Probably not as big a deal as you think. (Score:2)
Re:Probably not as big a deal as you think. (Score:3, Informative)
And... the pointers have to be loaded. It will take more address bits in the instructions to build constants. More cache used.
It is NOT highly likely that 64-bit alignment is done when optimizing. In fact, that's just wrong.
Yes, cache performance suffers.
Re:Probably not as big a deal as you think. (Score:3, Interesting)
Yes, x86 does not require alignment for the vast majority of data accesses, with pretty much the sole exceptions being SIMD instructions. And yes, that will run psychotically slower than aligning the data, which is why the compiler does it. Look into your MS VC++ optimization setting and see if it's using 4 byte or 8 byte alignment of structures by default. My goodness, it's 8 byte alignment, but why you ask? Because doubles need 8 byte a
Re:Probably not as big a deal as you think. (Score:2)
First you gush about having over 4 GB of RAM (Score:3, Insightful)
Re:First you gush about having over 4 GB of RAM (Score:2)
On the other hand, 64bit pointers make certain tradeoffs less desirable - for example, if you're passing around pointers to structs that are larger than 32 bits but smaller than 64, it's now more efficent to pass by value. Thats a pretty borderline case, though...
Cache effects (Score:2)
Re:Cache effects (Score:2)
As I mentioned previously, this can be more than offset by the cost savings you get in using memory-mapped I/O. Using standard POSIX I/O, your data hits memory twice [slashdot.org].
Oh, and 64-bit CPUs tend to have larger cache lines to cope.
Re:Cache effects (Score:2)
Forget about I/O I'm talking about moving code from RAM to Level 1 cache.
Re:Cache effects (Score:2)
...and I'm saying that I/O can easily dominate cache. This is especially true when you consider that copying a few disk pages from one physical memory location to another could easily trash the contents of your L1 cache.
Re:Cache effects (Score:2)
Re:Cache effects (Score:2)
Fair enough. I hack a certain high-performance database server for a living. I/O often dominates our applications, so we really care about memory-mapped I/O. As a result, we often find ourselves scrounging address space on "large" databases. Maybe our domain is more sensitive about it than yours is.
Re:Cache effects (Score:2)
Re:Cache effects (Score:2)
Re:Cache effects (Score:2)
Re:really !? (Score:2)
Sure. The comparison is between 32 bit and 64 bit words, including but not necessarily limited to addressing.
paging has nothing to do with caching.
Have you ever heard the term "paging to disk"? Do you know what it means?
sparc64 (Score:4, Informative)
Re:sparc64 (Score:4, Insightful)
Re:sparc64 (Score:2)
Doubtful, his name is keesh afterall... He probably forgot. Oh well.
Alpha (Score:3, Insightful)
I concur with your findings. Back in the days I was experiencing a little disconfort with the speed of my Pentium 90 running linux, I decided to buy a Digital Alpha system 266 MHz. Both systems were configured with 64 MB, and both ran Red Hat 5.2.
Although the Alpha system is obviously superior in number crunching, I noticed it ran out of physical memory on a regular basis where my P90 whould still be happy. Part of the matter it that alpha binaries tended to be much larger, as was the kernel. But I'm also
Re:Alpha (Score:2)
IA64 programming (Score:5, Informative)
It's baaack (Score:2)
2 comments (Score:3, Interesting)
Second, here is a trick I have seen - it seems a bit strange but works well if you encapsulate your data well. Keep in mind that objects are generally aligned to a 8-byte boundary (if they are malloc'ed). That means your low 3 bits are not used at all. If your objects have, say, 64 bytes of data in them (possibly after a bit of padding) then you are wasting 6 bits. Just store your pointers as 32-bit words, shifted over by 6 bits. When you want to dereference them, your get-the-pointer accessor function just shifts them back and gives you a 64-bit pointer.
Now you have an effective address space of 256GB and your data size has not grown at all. Maybe you have taken a hit in performance but until you benchmark you never know...
And segments...? (Score:2, Informative)
I don't assume any CPU in particular just the principle of segments.
Answer: yes (Score:3, Interesting)
Thus, you can expect Java heaps to expand by about 50% when moving from 32-bit to 64-bit pointers. What effect this has on your program's performance depends on the relation between the program's resident sets and the machine's cache. For instance, if your program has a resident set of 200KB on a machine with a 256KB cache, then the extra 50% will blow the cache and kill your performance. If the resident set were 150KB, the performance impact would probably be minimal.
Disclaimer: I was doing this as a pet project in my spare time, so take these numbers with a grain of salt.
64 bit pointers on embedded platforms? (Score:3, Interesting)
Minor increase in memory use (Score:3, Informative)
The 64-bit version used about 15% more memory than the 32-bit version. But it was also 20% percent faster. That still puzzles me, because the server does not perform any 64-bit operations.
Re:Minor increase in memory use (Score:2)
I have another theory: the compiler could assume 64-bit operations were safe and did them behind my back for copying data. Or maybe the heap manager is vastly more efficient in 64-bit mode than in 32-bit.
I remember the watcom compiler doing similar tricks in 16-bit mode with inlining structure-copying by setting up SI and DI registers an doing 32-bit moves i
Why not use 16 bit code, then? (Score:5, Informative)
Seriously, it is faster. I've been writing in assembly for years, and unless I need a 32 bit pointer, I generally don't use them.
If you're that concerned about performance that you are analysing pointer size, you might as well code in assembly. Yes, 64 bit pointers have a bigger footprint, but we experienced the same problem when we went to unicode strings, 32 bit code, etc...
My advice is this: let the compiler deal with it. Unless you are willing to crank out a lot of hand-coded assembly or are interfacing with hardware, the 32/64 bit pointer question is pretty much moot. As it is, you can't control:
for (int x = 0; x < 256; x++)buffer[x] = 0;
Into something like this:
mov cx,64
mov eax,0
mov si,buffer
cld
rep stosd
Instead of the literal translations of the old compilers:
mov si,buffer
mov bx,0 ; this is the x variable
forlabel@10001:
mov [bx + si],0
mov ax,1
add ax,bx
xchg bx,ax
cmp bx,256
jl forlabel@10001
The former takes 68 instruction cycles, the later takes (6 * 256 + 2) = 1576!
The aforementioned issues have a much bigger impact on performance than pointer size. Given that the memory bus is at least 64 bits wide on anything newer than a pentium, you won't incur a clock cycle penalty for using 64 bit pointers.
The only thing that I would suggest is to watch where you place pointers in structures. For example, when building a linked list, you would want to do something like this:
class link {
link * ptrforward;
link * ptrbackward;
link * ptrdata;
}
rather than:
class link{
link * ptrdata;
link * ptrbackward;
link * ptrforward;
}
Because the processor pulls 64 bits per address accessed, the former structure would have the forward pointer in cache regardless of the pointer size. With the second structure, traversing a list in the forward direction would result in a cache miss on every node visited, regardless of pointer size (This applies only to the x86...).
My experience has been that pointer size is only relevant on truly tiny systems - for example, 16 bit code which has to fit into a few kilobytes. Usually, as programs scale to work with larger datasets, the percentage of memory used for pointers decreases rapidly. You'll find that as data sizes increase, the practical uses for linked structures shrink; locating an element by using a binary search on a sorted array scales much better than a linear search traversing linked list.
Re:Why not use 16 bit code, then? (Score:2)
Past the Pentium 2, the bus width went to 64 bits, so in this case, both pointers would be in cache if you were using a 32 bit system. If you're building for AMD's Opteron, you won't experience a performance hit when going 64 bit, because the Opteron's bus is 144 bits. However, the Itanium's bus is only 64 bits, so you might experience a
Dumb question (Score:2)
This is a dumb question. Do you really think that on a system with more than 4GB of memory that memory would be at such a premium that an additional four bytes per pointer would even be noticeable? Surely you jest!
Re:Who cares? (Score:4, Insightful)
Now where have we heard that before...
=Smidge=
...in their keyboard... (Score:2)
Re:Who cares? (Score:5, Informative)
Being able to mmap anything you want is something you just plain can't do on a 32bit CPU. If you want to write programs that don't worry about address space limitations, you need 64bit. Anything that simplifies programming is good, since programmer time is valuable.
Besides that, even if you have 1GB of RAM on i386, Linux needs highmem support to use it all. (It reserves 3GB of virtual address space for user space, and the kernel maps as much RAM as it can with the address space that's left over after mapping PCI and AGP space. So 64bit is useful even on good desktop machines right now. (using highmem slows the kernel down, so might not even be worth it to map the last ~100MiB if you have 1GiB installed.)
Stupid crap like highmem is exactly why we should be using 64bit CPUs.
Video editing will become more widespread (Score:2, Insightful)
Now, this kind of stuff might be useful for...um...hard-core video editing...and really, really huge servers, but that's about it. The truth of the matter is that your everyday user just has no need to handle numbers of that size or data of those quantities.
What happens when "your everyday user" wants to perform "hard-core video editing" on footage she shot of her family with her miniDV camcorder?
Re:Who cares? (Score:2)
Here's the thing.
If you spin off threads, each thread gets a reserved chunk of address space for its stack. It shares code and data. The stack MUST be addressable by other threads, to allow proper thread semantics for data sharing.
If (as is typical) 1MB is reserved for the thread stack, 1000 threads will take up 1GB of address space and 4000 threads fill memory address space in a 32 bit address space.
So, you have a fancy web server,
Re:Who cares? (Score:2)
Re:Who cares? (Score:2)
Re:Quick Quiz (Score:2, Informative)