From a language level, Swift has a more general type system than C# does, offers more advanced value types, protocol extensions, etc. Swift also has advantages in mobile use cases because ARC requires significantly less memory than garbage collected languages for a given workload.
I feel like this should be quoted any time a C# programmer comes along thinking they have the perfect language (unaware of what else is out there). C# is great, but it's not the greatest possible language.
I'm not convinced by Chris' argument here. GC is an abstract policy (objects go away after they become unreachable), ARC is a policy (GC for acyclic data structures, deterministic destruction when the object is no longer reachable) combined with a mechanism (per object refcounts, refcount manipulation on every update). There is a huge design space for mechanisms that implement the GC policy and they all trade throughput and latency in different ways. It would be entirely possible to implement the C# GC requirements using ARC combined with either a cycle detector or a full mark-and-sweep-like mechanism for collecting cycles. If you used a programming style without cyclic data structures then you'd end up with almost identical performance for both.
Most mainstream GC implementations favour throughput over latency. In ARC, you're doing (at least) an atomic op for every heap pointer assignment. In parallel code, this leads to false sharing (two threads updating references to point to the same object will contend on the reference count, even if they're only reading the object and could otherwise have it in the shared state in their caches). There is a small cost with each operation, but it's deterministic and it doesn't add much to latency (until you're in a bit of code that removes the last reference to a huge object graph and then has to pause while they're all collected - one of the key innovations of OpenStep was the autorelease pool, which meant that this kind of deallocation almost always happens in between runloop iteration). A number of other GC mechanisms are tuned for latency, to the extent that they can be used in hard realtime systems with a few constraints on data structure design (but fewer than if you're doing manual memory management).
This is, unfortunately, a common misconception regarding GC: that it implies a specific implementation choice. The first GC papers were published almost 60 years ago and it's been an active research area ever since, filling up the design space with vastly different approaches.
There were experiments done in the 1970s for Lisp systems that showed that ARC was generally the slowest garbage collection algorithm, despite what C++ programmers think. You pay for every pointer move, rather than just once at GC. And the 1980s generational systems were even better.
I do not know Swift, but C compatibility is blown out of the water when a GC can actually move objects in memory, as it should.
For some real time needs, it is more important to have the "payment" spread all over in a determinstic way rather than appearing all at once at a random point in time.
This just moves the problem, because now you need to know when to call GC before your memory floods (which once again, is more likely to happen in some real time environments)
If you can't figure out when is an appropriate time to call GC in a realtime system, then you and your compiler are not appropriate for real time systems. If your memory fills up because your real time constraints didn't give you time to clean it up, then potentially no memory management system would work and your system has a more fundamental problem. Real time just adds more constraints, and no magic will fix an overconstrained problem.
If there were experiments for LISP in the 70s it'll have very little to say about what an optimising compiler does now. Indeed much of Swift's ARC doesn't involve any actual reference counting at run time, as static analysis has already determined when many objects can be deleted.
Who says C++ programmers think ARC is fastest? The fastest is automatic stack based memory management, what with being free and all, and that's what's probably used most of all in C++.
Good point, and C#/Java programmers rarely use that for objects.
As far as I know you can't in those languages. The optimizing JIT does escape analysis which attempts it where possible, but it's never going to be as effective as C++ in that regard.
There's some more issues to it than that. C++ is more efficent because it can use the stack more and can store primitive types in stl collections directly. Language's like c# and java have to instantiate objects on the heap always and primitive type can't be store in collection classes. Then, look at the amount of memory that the runtime uses, it's far, far more than any c++ exe. For instance, run a empty c# Unity project and you've blown 50mb of RAM already.
I can't speak for the experiments you're referring to specifically, but I've seen studies make the mistake of comparing a 1:1 ratio of std::shared_ptr objects to garbage collected references in other languages. I hate these comparisons because it overlooks the fact that garbage collected languages are so enamored by their own GC firepower that they insist on creating garbage all over the place. Outside of the base primitives, pretty much everything becomes an object that is individually tracked on the heap.
C++ takes a more general approach. RAII with smart pointers can and should be used to manage any resource, including memory, database connections, files, whatever. This is not normally the best approach for memory, and reference counts have problems when you get into multithreading and caching, since they force another memory reference, and possibly a memory write, somewhere other than the actual variable. Like most things in language design, it's a tradeoff, and most modern languages have chosen more s
Take straight C for example. You need to define a variable, initialize it, do whatever with it, and then free it, within the scope of the function in order to make efficient use of it. Or you use malloc.
In Perl, PHP, Javascript, and most interpreted languages, you simply define the variable, some people remember to initialize it, do whatever with it, and then let it go out of scope for it to be garbage collected. If you do this frequently enough, like inside a tight loop, then the GC introduces latency.
In C++ you can specifically tell the C++ runtime to delete objects, and use C style varibles if you want the tighter control, or stick entirely with malloc if you want to use as little memory as possible.
The goal with GC should be determined by the nature of the device. A desktop system with a lot of memory will have no problem deferring garbage collection, but then you get sites like Twitter, which endlessly "grow" the DOM and never actually GC anything until the tab is refreshed. Before Chrome finally released a 64bit version, one would only get about 2 days out of a twitter tab before it would crash. Do this on mobile and it will crash hourly, because even though the mobile device may have 1GB of memory and run in 64bit mode, it never actually "stops" running things in the background, they are just paused, and only unloaded when memory is needed. A headless device that needs to run in a wiring closet without being reset for months or years, needs to be able to detect when memory is failing to be freed otherwise the device may stop working.
I have an example of this with a Startech IP-KVM which runs linux, but because Startech doesn't release updates for the things they put their brand on after the warranty expires, this IP-KVM remains in a useless state (due to it running a version of VNC and the SSL part only working over Java) and needs to be power cycled by the remote-PDU before it can be used. The device just runs out of memory from DoS-like activity and it overwhelms the logging processes.
And that's sloppy programming. There is such a thing as GC-friendly coding conventions. A GC is not supposed to exist for programmers to go nilly-wily "someone is going to clean my butt for me".
You make some good points, but it's worth noting that:
1. While you're completely correct in theory, if all of the mainstream implementations currently work in the assumed fashion then his point is still reasonably valid.
2. ARC is designed to minimise the refcount editing / false sharing. How well this goes in practice depends on the static analysis capabilities of the compiler; it will never be perfect but with a good compiler and a decent programmer it can probably be very good. It's certainly much better
1. While you're completely correct in theory, if all of the mainstream implementations currently work in the assumed fashion then his point is still reasonably valid.
The mainstream implementations optimise for a particular point in the tradeoff space because they're mainstream (i.e. intended to be used in that space). GCs designed for mobile and embedded systems work differently.
2. ARC is designed to minimise the refcount editing / false sharing
I've worked on the ARC optimisations a little bit. They're very primitive and the design means that you often can't elide the count manipulations. The optimisations are really there to simplify the front end, not to make the code better. When compiling Objective-C, clang is free to emit a retain for the object that's being stored and a release for the object that's being replaced. It doesn't need to try to be efficient, because the optimisers have knowledge of data and control flow that the front end lacks and so it makes sense to emit redundant operations in the front end and delete them in the middle. Swift does this subtly differently by having its own high-level IR that tracks dataflow, so the front end can feed cleaner IR into LLVM.
The design of ARC does nothing to reduce false sharing. Until 64-bit iOS, Apple was storing the refcount in a look-aside table (GNUstep put it in the object header about 20 years before Apple). This meant that you were acquiring a lock on a C++ map structure for each refcount manipulation, which made it incredibly expensive (roughly an order of magnitude more expensive than the GNUstep implementation).
Oh, and making the same optimisations work with C++ shared_ptr would be pretty straightforward, but it's mostly not needed because the refcount manipulations are inlined and simple arithmetic optimisations elide the redundant ones.
The design of ARC does nothing to reduce false sharing. Until 64-bit iOS, Apple was storing the refcount in a look-aside table (GNUstep put it in the object header about 20 years before Apple). This meant that you were acquiring a lock on a C++ map structure for each refcount manipulation, which made it incredibly expensive (roughly an order of magnitude more expensive than the GNUstep implementation).
No, they didn't. There was one byte for the refcount, with 1..127 meaning "real refcount" and 128..255 meaning "(refcount - 192) plus upper bits stored elsewhere". The look-aside table was only used first if the refcount exceeded 127, and then the refcount would be 192 stored in the object, and the rest elsewhere. The next change would happen only if you increased or decreased the ref count by 64 in total. Very, very rare in practice.
The argument made here in reference to the paper is that ARC is a useful strategy when considering performance in relation to physical memory size. The throughput/latency trade-off is important for mobile applications, but as you mention, some implementations of a GC can perform well for latency (which is obviously crucial for a mobile app). The enormous performance penalty as the GC heap size approaches the amount of physical memory, however, is a major issue that cannot be easily worked around on a smalle
Most of what you say is true but it is missing a huge aspect: memory usage.
GC implementations trade peak memory usage for processing efficiency. Given that computing cost is in MANY cases memory size based (typical pricing of virtual machines due to the number that can be packed it) advantage shifts back to ARC and lower memory overhead (even if total CPU overhead is higher). Many GC systems require 2x the peak RAM that the application is actually using.
Any mark and sweep process is also likely to be brutal
The only GC mechanism that requires double the memory that you use is a semispace compactor. A lot of modern GCs use this for the young generation (if the space fits in the cache, it's very cheap, especially if you use nontemporal loads / stores when relocating the objects. Some work at Sun Research a decade ago showed that you could do it entirely in hardware in the cache controller very cheaply). Most GCs use mark-and-compact on smaller regions than the entire heap. You're right that you get some cach
The term to search for in the research literature is barrier elision. The new and shiny optimisations in Swift are things that garbage collected language implementations have been doing for around 30 years. Finalisers are a pain to support, but you either need to support them or you need some other mechanism for preventing non-memory resource (e.g. file descriptor) leaks.
"Who alone has reason to *lie himself out* of actuality? He who *suffers*
from it."
-- Friedrich Nietzsche
C# vs Swift (Score:5, Interesting)
From a language level, Swift has a more general type system than C# does, offers more advanced value types, protocol extensions, etc. Swift also has advantages in mobile use cases because ARC requires significantly less memory than garbage collected languages for a given workload.
I feel like this should be quoted any time a C# programmer comes along thinking they have the perfect language (unaware of what else is out there). C# is great, but it's not the greatest possible language.
Re:C# vs Swift (Score:5, Insightful)
I'm not convinced by Chris' argument here. GC is an abstract policy (objects go away after they become unreachable), ARC is a policy (GC for acyclic data structures, deterministic destruction when the object is no longer reachable) combined with a mechanism (per object refcounts, refcount manipulation on every update). There is a huge design space for mechanisms that implement the GC policy and they all trade throughput and latency in different ways. It would be entirely possible to implement the C# GC requirements using ARC combined with either a cycle detector or a full mark-and-sweep-like mechanism for collecting cycles. If you used a programming style without cyclic data structures then you'd end up with almost identical performance for both.
Most mainstream GC implementations favour throughput over latency. In ARC, you're doing (at least) an atomic op for every heap pointer assignment. In parallel code, this leads to false sharing (two threads updating references to point to the same object will contend on the reference count, even if they're only reading the object and could otherwise have it in the shared state in their caches). There is a small cost with each operation, but it's deterministic and it doesn't add much to latency (until you're in a bit of code that removes the last reference to a huge object graph and then has to pause while they're all collected - one of the key innovations of OpenStep was the autorelease pool, which meant that this kind of deallocation almost always happens in between runloop iteration). A number of other GC mechanisms are tuned for latency, to the extent that they can be used in hard realtime systems with a few constraints on data structure design (but fewer than if you're doing manual memory management).
This is, unfortunately, a common misconception regarding GC: that it implies a specific implementation choice. The first GC papers were published almost 60 years ago and it's been an active research area ever since, filling up the design space with vastly different approaches.
Garbage Collection (Score:5, Interesting)
There were experiments done in the 1970s for Lisp systems that showed that ARC was generally the slowest garbage collection algorithm, despite what C++ programmers think. You pay for every pointer move, rather than just once at GC. And the 1980s generational systems were even better.
I do not know Swift, but C compatibility is blown out of the water when a GC can actually move objects in memory, as it should.
Re: (Score:1)
For some real time needs, it is more important to have the "payment" spread all over in a determinstic way rather than appearing all at once at a random point in time.
Re: (Score:2, Informative)
This just moves the problem, because now you need to know when to call GC before your memory floods (which once again, is more likely to happen in some real time environments)
Re: (Score:1)
If you can't figure out when is an appropriate time to call GC in a realtime system, then you and your compiler are not appropriate for real time systems. If your memory fills up because your real time constraints didn't give you time to clean it up, then potentially no memory management system would work and your system has a more fundamental problem. Real time just adds more constraints, and no magic will fix an overconstrained problem.
Re: (Score:3)
If so then use a real-time GC algorithm.
Re:Garbage Collection (Score:5, Informative)
If there were experiments for LISP in the 70s it'll have very little to say about what an optimising compiler does now. Indeed much of Swift's ARC doesn't involve any actual reference counting at run time, as static analysis has already determined when many objects can be deleted.
Re: (Score:2)
No they didn't. They had pretty much no optimisations at all. You have no idea how limited the resources were back then.
Re: (Score:3)
Who says C++ programmers think ARC is fastest? The fastest is automatic stack based memory management, what with being free and all, and that's what's probably used most of all in C++.
Re: (Score:2)
Re: (Score:2)
Good point, and C#/Java programmers rarely use that for objects.
As far as I know you can't in those languages. The optimizing JIT does escape analysis which attempts it where possible, but it's never going to be as effective as C++ in that regard.
Re: (Score:1)
There's some more issues to it than that.
C++ is more efficent because it can use the stack more and can store primitive types in stl collections directly.
Language's like c# and java have to instantiate objects on the heap always and primitive type can't be store in collection classes.
Then, look at the amount of memory that the runtime uses, it's far, far more than any c++ exe. For instance, run a empty c# Unity project and you've blown 50mb of RAM already.
Re: (Score:2)
You pay for every pointer move........when a GC can actually move objects in memory
That's not actually a common need
Re: (Score:2)
I can't speak for the experiments you're referring to specifically, but I've seen studies make the mistake of comparing a 1:1 ratio of std::shared_ptr objects to garbage collected references in other languages. I hate these comparisons because it overlooks the fact that garbage collected languages are so enamored by their own GC firepower that they insist on creating garbage all over the place. Outside of the base primitives, pretty much everything becomes an object that is individually tracked on the heap.
Re: (Score:2)
C++ takes a more general approach. RAII with smart pointers can and should be used to manage any resource, including memory, database connections, files, whatever. This is not normally the best approach for memory, and reference counts have problems when you get into multithreading and caching, since they force another memory reference, and possibly a memory write, somewhere other than the actual variable. Like most things in language design, it's a tradeoff, and most modern languages have chosen more s
Re: (Score:2)
The problem with GC is that it's inherently lazy.
Take straight C for example. You need to define a variable, initialize it, do whatever with it, and then free it, within the scope of the function in order to make efficient use of it. Or you use malloc.
In Perl, PHP, Javascript, and most interpreted languages, you simply define the variable, some people remember to initialize it, do whatever with it, and then let it go out of scope for it to be garbage collected. If you do this frequently enough, like inside a tight loop, then the GC introduces latency.
In C++ you can specifically tell the C++ runtime to delete objects, and use C style varibles if you want the tighter control, or stick entirely with malloc if you want to use as little memory as possible.
The goal with GC should be determined by the nature of the device. A desktop system with a lot of memory will have no problem deferring garbage collection, but then you get sites like Twitter, which endlessly "grow" the DOM and never actually GC anything until the tab is refreshed. Before Chrome finally released a 64bit version, one would only get about 2 days out of a twitter tab before it would crash. Do this on mobile and it will crash hourly, because even though the mobile device may have 1GB of memory and run in 64bit mode, it never actually "stops" running things in the background, they are just paused, and only unloaded when memory is needed. A headless device that needs to run in a wiring closet without being reset for months or years, needs to be able to detect when memory is failing to be freed otherwise the device may stop working.
I have an example of this with a Startech IP-KVM which runs linux, but because Startech doesn't release updates for the things they put their brand on after the warranty expires, this IP-KVM remains in a useless state (due to it running a version of VNC and the SSL part only working over Java) and needs to be power cycled by the remote-PDU before it can be used. The device just runs out of memory from DoS-like activity and it overwhelms the logging processes.
And that's sloppy programming. There is such a thing as GC-friendly coding conventions. A GC is not supposed to exist for programmers to go nilly-wily "someone is going to clean my butt for me".
Re: (Score:1)
You make some good points, but it's worth noting that:
1. While you're completely correct in theory, if all of the mainstream implementations currently work in the assumed fashion then his point is still reasonably valid.
2. ARC is designed to minimise the refcount editing / false sharing. How well this goes in practice depends on the static analysis capabilities of the compiler; it will never be perfect but with a good compiler and a decent programmer it can probably be very good. It's certainly much better
Re:C# vs Swift (Score:4, Interesting)
1. While you're completely correct in theory, if all of the mainstream implementations currently work in the assumed fashion then his point is still reasonably valid.
The mainstream implementations optimise for a particular point in the tradeoff space because they're mainstream (i.e. intended to be used in that space). GCs designed for mobile and embedded systems work differently.
2. ARC is designed to minimise the refcount editing / false sharing
I've worked on the ARC optimisations a little bit. They're very primitive and the design means that you often can't elide the count manipulations. The optimisations are really there to simplify the front end, not to make the code better. When compiling Objective-C, clang is free to emit a retain for the object that's being stored and a release for the object that's being replaced. It doesn't need to try to be efficient, because the optimisers have knowledge of data and control flow that the front end lacks and so it makes sense to emit redundant operations in the front end and delete them in the middle. Swift does this subtly differently by having its own high-level IR that tracks dataflow, so the front end can feed cleaner IR into LLVM.
The design of ARC does nothing to reduce false sharing. Until 64-bit iOS, Apple was storing the refcount in a look-aside table (GNUstep put it in the object header about 20 years before Apple). This meant that you were acquiring a lock on a C++ map structure for each refcount manipulation, which made it incredibly expensive (roughly an order of magnitude more expensive than the GNUstep implementation).
Oh, and making the same optimisations work with C++ shared_ptr would be pretty straightforward, but it's mostly not needed because the refcount manipulations are inlined and simple arithmetic optimisations elide the redundant ones.
Re: (Score:2)
The design of ARC does nothing to reduce false sharing. Until 64-bit iOS, Apple was storing the refcount in a look-aside table (GNUstep put it in the object header about 20 years before Apple). This meant that you were acquiring a lock on a C++ map structure for each refcount manipulation, which made it incredibly expensive (roughly an order of magnitude more expensive than the GNUstep implementation).
No, they didn't. There was one byte for the refcount, with 1..127 meaning "real refcount" and 128..255 meaning "(refcount - 192) plus upper bits stored elsewhere". The look-aside table was only used first if the refcount exceeded 127, and then the refcount would be 192 stored in the object, and the rest elsewhere. The next change would happen only if you increased or decreased the ref count by 64 in total. Very, very rare in practice.
Re: (Score:3)
Re: (Score:2)
Most of what you say is true but it is missing a huge aspect: memory usage.
GC implementations trade peak memory usage for processing efficiency. Given that computing cost is in MANY cases memory size based (typical pricing of virtual machines due to the number that can be packed it) advantage shifts back to ARC and lower memory overhead (even if total CPU overhead is higher). Many GC systems require 2x the peak RAM that the application is actually using.
Any mark and sweep process is also likely to be brutal
Re: (Score:2)
Re: (Score:2)