



Hardware Based XRender Slower than Software Rendering? 297
Neon Spiral Injector writes "Rasterman of Enlightenment fame has finally updated the news page of his personal site. It seems that the behind the scenes work for E is coming along. He is investigating rendering backends for Evas. The default backend is a software renderer written by Raster. Trying to gain a little more speed he ported it to the XRender extension, only to find that it became 20-50 times slower on his NVidia card. He has placed some sample code on this same news page for people to try, and see if this is also experienced on other setups."
is this the man who said that "Windows has won"? (Score:3, Insightful)
Thats a myth. (Score:2, Insightful)
What Apps can I not run under Linux?
My browser works, most of my games work, Photoshop works, Microsoft word works,
Do your research, Wine, Transgaming, Crossoveroffice
Keith IS being paid. (Score:3, Insightful)
Also the only reason its taking so long is because they wont fork, theres millions of developers who Redhat, Suse, Lindows etc would love to pay to develop Xrender, you think Keith Packard is the only developer in the world qualified to do this? No hes not, and neither is Carl Worth, but until there is a fork, everything goes through this core group of developers who decide everything.
Its a management issue moreso than lack of developers or lack of money, believe me if Transgaming can get money, Xfree could get about x10 that amount of money, Mandrake has 15,000 subscribers paying $60 a year or something.
This isnt about money, its not about lack of programmers, its about management, the developers argue and fight over stupid stuff on mailing lists, theres only two developers working on Xrender and these developers seem over worked because they are doing so many other projects.
Its more complicated than it seems.
Xwin is not an official fork, at least I was told that it wasnt a fork, it was more of a threat of a fork, I am wishing and hoping they DO fork and then accept money somehow so we can pay developers to write this very important code.
Re:Lessons from the ancient (Score:5, Insightful)
Unfortunately, the fast loaders assuming that the CPU and the drive both ran at exactly the same speed was a cause for problems. The PAL version of the C64 ran at a different speed (a bit slower, I believe), thus making fast loaders either NTSC or PAL specific (although there may have been one or two that could actually take the clock speed into consideration). The same fault meant that fast loaders sometimes didn't work with some variants of the drives (different CPU's, all supposedly 6502 compatible, but not necessarily so).
Additionally, because these fast loaders required exact timing, something had to be done with the VIC-II (interrupts from it would cause the 6510 in the C64 to lose it's timing) - usually the screen was blanked (basically turning off the VIC-II), or at the least, turning off sprites (sprites by the way, while nice, were a PITA becuase they disrupted everything, including raster timing).
Commodore did screw things up... They had four (or was it six?) connectors on each end of the cable, they could have made it at least quai-parallel, rather than the serial with handshaking. Unfortunately, they only hooked up two, CLK (handshaking clock) and DATA (for the data bit). However, seeing as the 1541 was the same hardware mechanism as the 1540 (it's predecessor for the VIC-20) and contained most of the same software (you could use a "user" command to change the speed for the VIC-20), they couldn't just go out and change the design. I almost get the feeling that they took the serial bus from the VIC-20, put it in the C64, figuring that they'd be able to use the 1540 drive. Then at the last minute, they realized that it wouldn't work and they made the 1541, as well as a ROM upgrade for the 1540 to work with the C64.
While getting rid of the handshaking and transferring an extra bit over that line made sense then, with modern computers, I wouldn't trust it. There's too many components from too many manufacturers, and I really like my MP3 and pr0n collections too much to lose them to one bit being corrupted.
-- Joe
The results are not obviously broken (Score:5, Insightful)
A lot of people are questioning the results claimed by Rasterman; however try downloading the thing and running it for yourself. I see the same trend that Rasterman claims when I do it.
My system: Athlon 800, nVidia 2-GTS.
Drivers: nVidia driver, 1.0.4363 (Gentoo)
Kernel: 2.4.20-r6 (Gentoo)
X11: XFree86 4.3.0
I've checked and:
The benchmark consists of rendering an alphablended bitmap to the screen repeatedly using Render extension (on- and off-screen) and imlib2. Various scaling modes are also tried.
When there's no scaling involved, the hardware Render extension wins; it's over twice as fast. That's only the first round of tests though. The rest of the rounds all involve scaling (half- and double-size, various antialiasing modes). For these, imlib2 walks all over the Render extension; we're talking three and a half minutes versus 6 seconds in one of the rounds; the rest are similar.
I'm not posting the exact figures since the benchmark isn't scientific and worrying about exact numbers isn't the point; the trend is undeniable. Things like agpgart versus nVidia's internal AGP driver should not account for the wide gap.
Given that at least one of the rounds in the benchmark shows the Render extension winning, I'm going to take a stab at explaining the results by suggesting that the hardware is probably performing the scaling operations each and every time, while imlib2 caches the results (or something). The results seem to suggest that scaling the thing once and then reverting to non-scaling blitting would improve at least some of the rounds; this is too easy, however, since while it helps the application that knows it's going to repeatedly blit the same scaled bitmap, not all applications know this a priori.
- Andrew
I've experienced this myself. (Score:4, Insightful)
== Hardware ==
Vertex coordinates, texture coordinates and primative types are DMA'd to the video card. The video card finds the texture and loads all the information into it's registers. It the executes triangle setup, then the triangle fill operation - twice (because it's drawing a quad).
== Software ==
Source texture is copied by the CPU to hardware memory, line by line.
Actual peak fill rate in software will be lower than hardware - but if your code is structured correctly (textures in the right format, etc) - there's no setup. The hardware latency looses out to the speed of your CPU's cache - the software copy has the same complexity as making the calls to the graphics card.
The trick is to *batch* your commands. Sending several hundred primatives to the hardware at the same time will blow software away - especially as the area to be filled increases. Well.. most of the time, but it really depends on what you're doing.
Re:accelerated? (Score:2, Insightful)
I'm not so sure of that. Rasterman may have been the first person to write a window manager with quite that much eye candy -- but god, man, have you seen his code?
I can't speak for anything he's written within the last 18 months or so (maybe it's been longer now?), but last time I looked at his C it was ugly, unportable sh*t.
That said, I'll be curious to see what the XRender folks have to say re these benchmarks.
Re:Putting the "wine" back in whining. (Score:3, Insightful)
Besides we shouldn't be competing with MacOS or Windows. We don't need to clone those OS's or desktops. We need to create our own desktop that is unique. Make it work.. don't make it just to attract Windows and Mac users.
Re:2D acceleration using OpenGL? (Score:3, Insightful)
Isn't that what Blender does? They implement their GUI using OpenGL, drawing all the widgets themselves, so that their interface is the same on all platforms they are ported to.
Re:One word: (Score:5, Insightful)
It's great for using 3D effects on 2D windows for what has normally been wasted horsepower. Finally, eye candy that won't slow down your system!
It's wrong to use 3D functionality for 2D graphics (Score:0, Insightful)
3D means:
2D means:
Although 2D and 3D share some concepts, there are entirely two different things. As today's software requires both windowed graphics and full screen/windowed 3D graphics, graphics cards must have circuits for both 2D and 3D graphics. A 2D graphics hardware implementation is something very trivial and very cheap these days.
Therefore, I find it innapropriate to use 3D graphics for 2D rendering. It will certainly not speed up drawing operations, because the 3D requires more steps than 2D, even if the Z coordinate is always 0. Why Linux should use 3D for 2D operations ?
Caching (Score:4, Insightful)
The first time the scale test is called, I rendered the image to an offscreen buffer with the correct transformations set. Then after that I just XRenderComposite to the screen from the offscreen buffer. The results (NVidia 4496, RenderAccel=true, geforce2 MX,athlon XP 1800+) for one test are:
*** ROUND 2 ***
Test: Test Xrender doing 1/2 scaled Over blends - caching implementation
Time: 0.126 sec.
Test: Test Xrender doing 1/2 scaled Over blends - original implementation
Time: 6.993 sec.
Test: Test Imlib2 doing 1/2 scaled Over blends
Time: 0.191 sec.
Which shows Xrender taking two-thirds the time of imlib.
My guess is that imlib is probably caching something. This is supported by the fact that Xrender is faster for the non-scaled composition in the original code.
Re:accelerated? (Score:3, Insightful)
So no, I'm not talking about aesthetics.
That said, I disagree that writing cluttered code is at all analogous to having a cluttered room. Efficiency, in the case of code, isn't just how quickly someone can get a first copy written, and it's not just about how fast that copy runs on the developer's box. Code that's blindingly fast on one particular configuration but won't run at all on a different machine (or which loses its performance edge when moved too far) is far worse than code written with attention to algorithmic optimizations, or -- even better -- code which is written with cleanliness and maintainability in mind (to allow for simpler refactoring later, be it for performance or features -- not to mention the far greater advantage of straightforward debugging).
Let me propose an alternate analogy: Cluttered code is akin to poorly-written English -- think of an ill-structured essay or book, full of not only spelling and grammatical errors but logical fallacies to boot. While a cluttered room or desk might be excusable as personal style -- as you put such down to -- cluttered English or cluttered code are both symptomatic of poor thought processes going into their making.