Forgot your password?
typodupeerror
Enlightenment GUI Software X

Hardware Based XRender Slower than Software Rendering? 297

Posted by michael
from the unsolved-mysteries dept.
Neon Spiral Injector writes "Rasterman of Enlightenment fame has finally updated the news page of his personal site. It seems that the behind the scenes work for E is coming along. He is investigating rendering backends for Evas. The default backend is a software renderer written by Raster. Trying to gain a little more speed he ported it to the XRender extension, only to find that it became 20-50 times slower on his NVidia card. He has placed some sample code on this same news page for people to try, and see if this is also experienced on other setups."
This discussion has been archived. No new comments can be posted.

Hardware Based XRender Slower than Software Rendering?

Comments Filter:
  • by rxed (634882) on Saturday August 16, 2003 @12:38AM (#6710476)
    Is this the same person who some time ago said that: "Windows has won. Face it. The market is not driven by a technically superior kernel, or an OS that avoids its crashes a few times a day. Users don't (mostly) care. They just reboot and get on with it. They want apps. If the apps they want and like aren't there, it's a lose-lose. Windows has the apps. Linux does not. Its life on the desktop is limited to nice areas (video production, though Mac is very strong and with a UNIX core now will probably end up ruling the roost). The only place you are likely to see Linux is the embedded space." Slashdot article is also available here: http://slashdot.org/articles/02/07/20/1342205.shtm l?tid=106
  • Thats a myth. (Score:2, Insightful)

    by HanzoSan (251665) on Saturday August 16, 2003 @12:42AM (#6710483) Homepage Journal


    What Apps can I not run under Linux?

    My browser works, most of my games work, Photoshop works, Microsoft word works,

    Do your research, Wine, Transgaming, Crossoveroffice
  • by HanzoSan (251665) on Saturday August 16, 2003 @12:47AM (#6710499) Homepage Journal


    Also the only reason its taking so long is because they wont fork, theres millions of developers who Redhat, Suse, Lindows etc would love to pay to develop Xrender, you think Keith Packard is the only developer in the world qualified to do this? No hes not, and neither is Carl Worth, but until there is a fork, everything goes through this core group of developers who decide everything.

    Its a management issue moreso than lack of developers or lack of money, believe me if Transgaming can get money, Xfree could get about x10 that amount of money, Mandrake has 15,000 subscribers paying $60 a year or something.

    This isnt about money, its not about lack of programmers, its about management, the developers argue and fight over stupid stuff on mailing lists, theres only two developers working on Xrender and these developers seem over worked because they are doing so many other projects.

    Its more complicated than it seems.

    Xwin is not an official fork, at least I was told that it wasnt a fork, it was more of a threat of a fork, I am wishing and hoping they DO fork and then accept money somehow so we can pay developers to write this very important code.

  • by The Vulture (248871) on Saturday August 16, 2003 @01:23AM (#6710607) Homepage
    The 1541 drive itself was actually quite fast, reading an entire sector in much less than a second (if you set the job code directly in the drive). It was the serial transfer that was super slow (as you stated).

    Unfortunately, the fast loaders assuming that the CPU and the drive both ran at exactly the same speed was a cause for problems. The PAL version of the C64 ran at a different speed (a bit slower, I believe), thus making fast loaders either NTSC or PAL specific (although there may have been one or two that could actually take the clock speed into consideration). The same fault meant that fast loaders sometimes didn't work with some variants of the drives (different CPU's, all supposedly 6502 compatible, but not necessarily so).

    Additionally, because these fast loaders required exact timing, something had to be done with the VIC-II (interrupts from it would cause the 6510 in the C64 to lose it's timing) - usually the screen was blanked (basically turning off the VIC-II), or at the least, turning off sprites (sprites by the way, while nice, were a PITA becuase they disrupted everything, including raster timing).

    Commodore did screw things up... They had four (or was it six?) connectors on each end of the cable, they could have made it at least quai-parallel, rather than the serial with handshaking. Unfortunately, they only hooked up two, CLK (handshaking clock) and DATA (for the data bit). However, seeing as the 1541 was the same hardware mechanism as the 1540 (it's predecessor for the VIC-20) and contained most of the same software (you could use a "user" command to change the speed for the VIC-20), they couldn't just go out and change the design. I almost get the feeling that they took the serial bus from the VIC-20, put it in the C64, figuring that they'd be able to use the 1540 drive. Then at the last minute, they realized that it wouldn't work and they made the 1541, as well as a ROM upgrade for the 1540 to work with the C64.

    While getting rid of the handshaking and transferring an extra bit over that line made sense then, with modern computers, I wouldn't trust it. There's too many components from too many manufacturers, and I really like my MP3 and pr0n collections too much to lose them to one bit being corrupted.

    -- Joe
  • by asnare (530666) on Saturday August 16, 2003 @01:47AM (#6710666)

    A lot of people are questioning the results claimed by Rasterman; however try downloading the thing and running it for yourself. I see the same trend that Rasterman claims when I do it.

    My system: Athlon 800, nVidia 2-GTS.
    Drivers: nVidia driver, 1.0.4363 (Gentoo)
    Kernel: 2.4.20-r6 (Gentoo)
    X11: XFree86 4.3.0

    I've checked and:

    1. agpgart is being used;
    2. XF86 option "RenderAccel" is on.

    The benchmark consists of rendering an alphablended bitmap to the screen repeatedly using Render extension (on- and off-screen) and imlib2. Various scaling modes are also tried.

    When there's no scaling involved, the hardware Render extension wins; it's over twice as fast. That's only the first round of tests though. The rest of the rounds all involve scaling (half- and double-size, various antialiasing modes). For these, imlib2 walks all over the Render extension; we're talking three and a half minutes versus 6 seconds in one of the rounds; the rest are similar.

    I'm not posting the exact figures since the benchmark isn't scientific and worrying about exact numbers isn't the point; the trend is undeniable. Things like agpgart versus nVidia's internal AGP driver should not account for the wide gap.

    Given that at least one of the rounds in the benchmark shows the Render extension winning, I'm going to take a stab at explaining the results by suggesting that the hardware is probably performing the scaling operations each and every time, while imlib2 caches the results (or something). The results seem to suggest that scaling the thing once and then reverting to non-scaling blitting would improve at least some of the rounds; this is too easy, however, since while it helps the application that knows it's going to repeatedly blit the same scaled bitmap, not all applications know this a priori.

    - Andrew

  • by Anonymous Coward on Saturday August 16, 2003 @02:22AM (#6710743)
    The problem is in *sending* the graphics commands to the hardware. If you're manually sending quads one at a time, I found that for 16x16 squares on screen, it's faster to do it in software than on a GEForce 2 (that was what I had at the time - this was a few years back). Think about it:

    == Hardware ==

    Vertex coordinates, texture coordinates and primative types are DMA'd to the video card. The video card finds the texture and loads all the information into it's registers. It the executes triangle setup, then the triangle fill operation - twice (because it's drawing a quad).

    == Software ==

    Source texture is copied by the CPU to hardware memory, line by line.

    Actual peak fill rate in software will be lower than hardware - but if your code is structured correctly (textures in the right format, etc) - there's no setup. The hardware latency looses out to the speed of your CPU's cache - the software copy has the same complexity as making the calls to the graphics card. :)

    The trick is to *batch* your commands. Sending several hundred primatives to the hardware at the same time will blow software away - especially as the area to be filled increases. Well.. most of the time, but it really depends on what you're doing.
  • Re:accelerated? (Score:2, Insightful)

    by cduffy (652) <charles+slashdot@dyfis.net> on Saturday August 16, 2003 @02:56AM (#6710887)
    ...Rasterman is supposed to be some sort of expert in producing nice fast graphics on X so I'd say this is unlikely.

    I'm not so sure of that. Rasterman may have been the first person to write a window manager with quite that much eye candy -- but god, man, have you seen his code?

    I can't speak for anything he's written within the last 18 months or so (maybe it's been longer now?), but last time I looked at his C it was ugly, unportable sh*t.

    That said, I'll be curious to see what the XRender folks have to say re these benchmarks.
  • by MikeFM (12491) on Saturday August 16, 2003 @03:02AM (#6710908) Homepage Journal
    I still believe in the answer that was always given for id games when asked when they'll release the product.. "When it's done." I think that applies to opensource even more.

    Besides we shouldn't be competing with MacOS or Windows. We don't need to clone those OS's or desktops. We need to create our own desktop that is unique. Make it work.. don't make it just to attract Windows and Mac users.
  • by harlows_monkeys (106428) on Saturday August 16, 2003 @03:17AM (#6710951) Homepage
    ...it would be interesting to see how feasible it is to do all the 2D rendering using OpenGL

    Isn't that what Blender does? They implement their GUI using OpenGL, drawing all the widgets themselves, so that their interface is the same on all platforms they are ported to.

  • Re:One word: (Score:5, Insightful)

    by Talez (468021) on Saturday August 16, 2003 @08:43AM (#6711754)
    Also, Microsoft are getting in on the act. The new Desktop Composition Engine for Longhorn is based on the same type of compositing but using DirectX instead of OpenGL.

    It's great for using 3D effects on 2D windows for what has normally been wasted horsepower. Finally, eye candy that won't slow down your system!
  • by master_p (608214) on Saturday August 16, 2003 @08:44AM (#6711758)

    3D means:

    • very fast (and hardcoded) matrix operations
    • very fast rendering
    • passing lists of vertices, textures and shading instructions to the 3d hardware
    • do something else while the 3d hardware processes its lists
    • update the screen
    • do it in a single-multithreaded context, most possibly for a game

    2D means:

    • draw a line here, draw a line there
    • fill a shape here, fill a shape there
    • clip output
    • use brushes
    • use variable line sizes and line edge types
    • do it at the desired point in time
    • do it in a multithreaded context

    Although 2D and 3D share some concepts, there are entirely two different things. As today's software requires both windowed graphics and full screen/windowed 3D graphics, graphics cards must have circuits for both 2D and 3D graphics. A 2D graphics hardware implementation is something very trivial and very cheap these days.

    Therefore, I find it innapropriate to use 3D graphics for 2D rendering. It will certainly not speed up drawing operations, because the 3D requires more steps than 2D, even if the Z coordinate is always 0. Why Linux should use 3D for 2D operations ?

  • Caching (Score:4, Insightful)

    by BenjyD (316700) on Saturday August 16, 2003 @10:01AM (#6712043)
    Somebody mentioned below that imlib is probably caching the image, whereas Xrender is doing the transformation everytime. So I thought I'd try the same caching approach with Xrender.

    The first time the scale test is called, I rendered the image to an offscreen buffer with the correct transformations set. Then after that I just XRenderComposite to the screen from the offscreen buffer. The results (NVidia 4496, RenderAccel=true, geforce2 MX,athlon XP 1800+) for one test are:

    *** ROUND 2 ***

    Test: Test Xrender doing 1/2 scaled Over blends - caching implementation
    Time: 0.126 sec.

    Test: Test Xrender doing 1/2 scaled Over blends - original implementation
    Time: 6.993 sec.

    Test: Test Imlib2 doing 1/2 scaled Over blends
    Time: 0.191 sec.

    Which shows Xrender taking two-thirds the time of imlib.

    My guess is that imlib is probably caching something. This is supported by the fact that Xrender is faster for the non-scaled composition in the original code.
  • Re:accelerated? (Score:3, Insightful)

    by cduffy (652) <charles+slashdot@dyfis.net> on Saturday August 16, 2003 @11:14AM (#6712326)
    I mean unportable as in making assumptions about data types which are only valid on the architecture the developer is currently using. I mean ugly as in failing to initialize all his data. Even the very latest build of Evolution out of Debian unstable isn't even capable of running under valgrind, and a friend of mine -- one of the MIPS kernel folks and an *extremely* skilled hacker -- who was at one point working on a portable version of E simply threw up his hands in disgust.

    So no, I'm not talking about aesthetics.

    That said, I disagree that writing cluttered code is at all analogous to having a cluttered room. Efficiency, in the case of code, isn't just how quickly someone can get a first copy written, and it's not just about how fast that copy runs on the developer's box. Code that's blindingly fast on one particular configuration but won't run at all on a different machine (or which loses its performance edge when moved too far) is far worse than code written with attention to algorithmic optimizations, or -- even better -- code which is written with cleanliness and maintainability in mind (to allow for simpler refactoring later, be it for performance or features -- not to mention the far greater advantage of straightforward debugging).

    Let me propose an alternate analogy: Cluttered code is akin to poorly-written English -- think of an ill-structured essay or book, full of not only spelling and grammatical errors but logical fallacies to boot. While a cluttered room or desk might be excusable as personal style -- as you put such down to -- cluttered English or cluttered code are both symptomatic of poor thought processes going into their making.

Little known fact about Middle Earth: The Hobbits had a very sophisticated computer network! It was a Tolkien Ring...

Working...