Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!


Forgot your password?
Enlightenment GUI Software X

Hardware Based XRender Slower than Software Rendering? 297

Neon Spiral Injector writes "Rasterman of Enlightenment fame has finally updated the news page of his personal site. It seems that the behind the scenes work for E is coming along. He is investigating rendering backends for Evas. The default backend is a software renderer written by Raster. Trying to gain a little more speed he ported it to the XRender extension, only to find that it became 20-50 times slower on his NVidia card. He has placed some sample code on this same news page for people to try, and see if this is also experienced on other setups."
This discussion has been archived. No new comments can be posted.

Hardware Based XRender Slower than Software Rendering?

Comments Filter:
  • by gloth ( 180149 ) on Saturday August 16, 2003 @12:18AM (#6710390)
    He didn't really get too far into that, but it would be interesting to see how feasible it is to do all the 2D rendering using OpenGL, encapsulated by some layer, like his Evas.

    Has anyone done that? Any interesting results? One would think that there's a lot of potential here...
  • accelerated? (Score:4, Interesting)

    by Spy Hunter ( 317220 ) on Saturday August 16, 2003 @12:23AM (#6710409) Journal
    Is XRender really accelerated? I thought that most Render operations were still unaccelerated on most video cards, and how and if they could be accelerated was still an open question. Maybe the real problem here is Render's software rendering code?
  • duh (Score:3, Interesting)

    by SHEENmaster ( 581283 ) <travis AT utk DOT edu> on Saturday August 16, 2003 @12:29AM (#6710437) Homepage Journal
    graphics cards work quickly because they cut every corner that can possibly be cut. It makes sense that they would run computer software slower.

    I'm more interested in using them for specific calculations. Imagine if one of these things was accidentally embued with the ability to factor gigantic numbers. The AGP slot is just an excuse to keep us from beowulfing them over PCI-X
  • by madmarcel ( 610409 ) on Saturday August 16, 2003 @12:40AM (#6710478)
    When I enabled that setting on my linux box (redhat , latest version of X and a nvidia geforce 4200)
    I got weird glitches all over the screen, most notably in the window borders and wherever windows or menu's overlapped other things on the screen. There was an increase in speed however. As you might expect I disabled it after about 15 minutes. Ugh. I'll have another look at it when it's been fixed :D
  • by Amit J. Patel ( 14049 ) <amitp@cs.stanford.edu> on Saturday August 16, 2003 @12:48AM (#6710504) Homepage Journal

    There has been some work on using graphics cards for computation [att.com]. The tough part is figuring out how to rephrase your algorithm in terms of what the GPU can handle. You'd expect matrix math [cs.sfu.ca] to work out but people have tried to implement more interesting algorithms too. :-)

    - Amit [stanford.edu]
  • by Empiric ( 675968 ) * on Saturday August 16, 2003 @01:00AM (#6710542)
    There's an example from back in the 80's that still probably serves as a good engineering reference for people working on hardware/software driver issues.

    In those days of yore (only in the computer industry can one refer to something 20 years ago as "yore"...) there was the Commodore 64. It retains it's place as a pioneering home computer in that it offered very good (for the time) graphics and sound capability, and an amazing 64K of RAM, in an inexpensive unit. But then came its bastard son...

    The 1541 floppy disk drive. It became the storage option for a home user once they became infuriated enough with the capabilites of cassette-tape backup to pony up for storage on a real medium. Unfortunately, the 1541 was slow. Unbelievably slow. Slow enough to think, just maybe, there were little dwarven people in your serial interface cable running your bits back and forth by hand.

    Now, a very unique attribute of the 1541 drive was that it had its own 6502 processor and firmware. Plausibly, having in effect a "disk-drive-coprocessor" would accelerate your data transfer. It did not. Not remotely. Running through a disassembly of the 6502 firmware revealed endless, meandering code to provide what would appear, on the surface, to be a pretty straightforward piece of functionality: send data bits over the data pin and handshake it over the handshake signal pin.

    As the market forces of installed base and demand for faster speed imposed themselves, solutions to the 1541 speed problem were found by third party companies. Software was released which performed such functions as loading from disk and backing up floppies as speeds that were many, many times faster than the 1541's base hardware and firmware could offer.

    The top of this particular speed-enhancement heap was a nice strategy involving utilizing both the Commodore 64's and the 1541's processors, and the serial connection, optimally. Literally optimally. Assembly routines were written to run on the both 64 and the 1541 side to exactly synchronize the sending and receiving of bits on a clock-cycle by clock-cycle basis. Taking advantage of the fact both 6502's were running at 1 Mhz, the 1541's code would start blasting the data across the serial line to the corresponding 64 code, which would pull it off the serial bus within a 3-clock-cycle window (you could not write the two routines to be any more in sync than a couple 6502 instructions). This method used no handshaking whatsoever for large blocks of data being sent from the drive to the computer, and so, in an added speed coup, the handshaking line was also used for data, doubling the effective speed.

    The 1541 still seems pertinent as an example of a computer function that one would probably think would best be done primarily on a software level (running on the Commodore 64), but was engineered instead to utilize a more-hardware approach (on the 1541), only to be rescued by better software to utilize the hardware (on both).

    There's probably still a few design lessons from the "ancient" 1541, for both the hardware and the software guys.
  • by Animats ( 122034 ) on Saturday August 16, 2003 @01:02AM (#6710554) Homepage
    That's technically viable, and I've worked with some widget toolkits for Windows that render everything through OpenGL. On modern graphics hardware, this has good performance. After all, the hardware can draw complex scenes at the full refresh rate; drawing some flat bitmaps through the 3D engine isn't too tough.

    One problem is that multi-window OpenGL doesn't work that well. Game-oriented graphics boards don't have good support for per-window unsynchronized buffer swapping, so you tend to get one window redraw per frame time under Windows. (How well does Linux do with this?) Try running a few OpenGL apps that don't stress the graphics hardware at the same time. Do they slow down?

    One of the neater ways to do graphics is to use Flash for 2D and OpenGL for 3D. Quite a number of games work that way internally. The Flash rendering engine typically isn't Macromedia's, but Macromedia authoring tools are used. This gives the user interface designers great power without having to program.

  • Well, yes (Score:3, Interesting)

    by reynaert ( 264437 ) on Saturday August 16, 2003 @01:07AM (#6710572)

    As far as I know, only the Matrox G400 card has good hardware render accelaration. NVidia's support is still experimental and rather poor. Render is still considered experimental, and speed is not yet considered to be very important. Full accelerated support is planned for XFree86 5.

  • by red floyd ( 220712 ) on Saturday August 16, 2003 @01:22AM (#6710605)
    The other classic example was the original PC-AT MFM controller.

    IIRC, they originally tried (slave mode -- the only available thing then) DMA, and in general, it was faster to pump the data out by hand.
  • by HanzoSan ( 251665 ) on Saturday August 16, 2003 @01:24AM (#6710609) Homepage Journal

    Interesting, but how can we fund them? They dont accept donations, they dont have a way for someone like me who doesnt have the skills to develop Xrender to pay people who do.

    2 people on Xrender is why its taking so long.
  • by garyebickford ( 222422 ) <gar37bic@gma i l .com> on Saturday August 16, 2003 @01:27AM (#6710617)
    I worked on 2D & 3D libs a while back for a graphics company. Among the biggest problems at the time was that each different output device had its own feature set, implemented slightly differently. Every designer had their own ideas of what would be 'cool' in their graphics engine, which tended to follow the latest progress in the field.

    General purpose graphics libraries such as ours ended up spending most of the time dealing with the cool features than the features saved. For example, if a plotter had a 2D perspective transform built in, was it better to do the 3D projection ourselves and just feed it untransformed vectors, or map the 3D in such a way as to allow the 2D processing of the plotter to help out? This might require pre-computing sample data.

    Also, since the plotter had 2D transforms we have to do a lot more work including reading the plotter's status and inverting the plotter's transform matrix to make sure that the resulting output didn't end up outside the plotter's viewport.

    A code analysis found that over 90% of the code and 90% of the processing time was spent preventing and dealing with input errors and handling compatibility issues.

    Nowadays, it's harder in many ways with a wide variety of hardware based texturing and other rendering - do we do the lighting model ourselves, or let the HW do it? It may depend on whether we're going for speed and 'looks' or photometric correctness.
  • Re:accelerated? (Score:3, Interesting)

    by saikatguha266 ( 688325 ) on Saturday August 16, 2003 @01:39AM (#6710642) Homepage

    The NVidia drivers say something about Render Accleration as someone already pointed out. However, there is definitely some glitch somewhere. I tried the benckmark with the RenderAccel both turned off and on on my GeForce 3 with the 4496 drivers and perceived no significant difference in the tests except for test 1. (11s for no accel, 2.5s for accel, 0.62 for imlib2). The rest of the tests sucked for the driver (11s, 215s, 183s, 356s for tests 2 to 5 -- both with and without render accel as opposed to 0.21s, 4.5s, 2.7s, 5.8s for imlib2).

    I use Xinerama with the secondary display on an ATI 98 Pro (Yay for college tuitions). One thing I did notice was that even in render-acclerated mode, if I drag the window to the middle straddling the screen split, the images display on both sides -- though ATI's side is scaled down even at the same resolution for some reason). However, if I use a gl application (glxgears, mplayers -vo gl2 etc) then straddling the screen only gives half a display on the GeForce board. So in this case, X is either not using XRender either because of NVidia drivers or is picking the lower of the capabilities of the video card. Or is doing something in the middle causing the GeForce and ATI displays to be different.

    I wonder if there is any way to explicitly force X to use the hardware for XRender as you can do with GL.

  • by LightStruk ( 228264 ) on Saturday August 16, 2003 @01:40AM (#6710645)
    and I noticed something strange. For those of you who can't or won't try Rasterman's benchmark yourself, the program runs six different tests, each of which uses a different scaling technique. Each of the six tests is run on the three different test platforms: XRender onscreen, XRender offscreen, and Imlib2. Imlib2 is also written by Rasterman, and is part of Enlightenment.

    Here are the test scores from one of the rounds -

    *** ROUND 3 ***

    Test: Test Xrender doing 2* smooth scaled Over blends
    Time: 196.868 sec.

    Test: Test Xrender (offscreen) doing 2* smooth scaled Over blends
    Time: 196.347 sec.

    Test: Test Imlib2 doing 2* smooth scaled Over blends
    Time: 6.434 sec.

    Now for the strange thing. For the first platform, I watched as the program drew the enlightenment logo thousands of times in the test window, as you would expect. For the second test, it took about the same amount of time, but drew offscreen, again, as the test's name would indicate. However, for the imlib2 test, it also didn't draw anything in the test window.
    I got the impression (perhaps wrongly?) that Imlib2 would actually draw to the screen as well. Since it doesn't change the screen, I have no way of telling if imlib2 is doing any drawing at all.

    So, I'm digging into the benchmark's code... I'll let you guys know what I find.
  • Re:Yawn (Score:2, Interesting)

    by OrangeTide ( 124937 ) on Saturday August 16, 2003 @01:48AM (#6710669) Homepage Journal
    client/server setup is a superior way of designing a windowing environment.

    X11 uses unix sockets (or optionally slower, less secure TCP) and shared memory.

    Win32 uses shared memory and messaging.

    MacOS X .. I don't know for certain, I hope it uses Mach kernel messages.

    QNX Photon uses qnx kernel messages and shared memory.

    The real difference is the layer at which the windowing system exists. in the case of X11, MacOS X and Photon. the windowing system is just another process.

    In Win32 it's a kernel thread (as far as I know). But still, you're sending messages from one place to another and constructing windows based on them.

    Client/Server is the natural way to build a multi-application graphical environment.

    Of course there are "fake" environments which amount of an embedded video driver and some library to draw widgets. (most DOS gui apps are like this).
  • by penguin7of9 ( 697383 ) on Saturday August 16, 2003 @02:32AM (#6710807)
    XRender is a new extension with only a reference implementation in XFree86. The point is to experiment with an API prior to freezing it. I know this may come as news to people who have grown up on Microsoft software, but real software developers first try out various ideas and then later start hacking it for speed. It would be quite surprising, actually, if it were faster than a hand-tuned client-side software implementation.

    It will be a while until XRender beats client-side software implementations. Furthermore, you can't just take a client-side renderer and hack in XRender calls and expect it to run fast--code that works efficiently with a client-server window system like X11 needs to be written differently than something that moves around pixels locally.
  • by Anonymous Coward on Saturday August 16, 2003 @03:44AM (#6711026)
    On modern graphics hardware, this has good performance.

    Exactly, on modern hardware. If you have anything less than ATI 9700, NV "hottest and greatest" or the only third (but really expensive, why I won't name them) vendor OpenGL-supporting 3D cards, you are indeed screwed. Bigtime!

    To explain what I'm talking about here: On a 2D card you can easily often move a full screen/vblank (OK, not true for PCI cards if you're like me using 1600x1200x32 ~= 7.32MB/frame, and at just 75Hz that's 549MB/second).

    That's obviously impossible, right? But what if you pushed e.g. MP3 video at 480x480 (plain VCD) in non-RGB format, but in some other fomrat. Let's call it YUV. Oh, I hear you say now. Yes, that's data rates the card cant complain too much about. Furthermore, it has support to stretch the image in hardware.

    But what about what we're talking about here? We're talking about plain RGB, 24/32-bit obviously. If using a 3D API you'd first have to create a quad, then *upload* the image you already have in memory to the card, and finally have the card "paint" that quad, stretched or not, onto the screen.

    Let me tell you, the amount of games that do this and manage to bring down the frame-rate for just shitty 2D work (this *includes* the mouse cursor) to less than 1 FPS (frame per second) has been enough to tell me:
    1. The world is full of idiots.
    2. The designers (of that software) are not really competent enough to make the decisions they do.
    3. Even if *you* happen to have an ATI9800, are you willing to byu such a card for everyone not having it?

    Had these people staied with a 2D API and only used the available 2D primitives for the 2D API they'd had an order of magnitude if not more higher framerate on all hardware.

    In all, it could be fun to look at, running at a machine suitably equipped (meaning, the "baddest" and most expensive money can buy) - but to actually use on the machines we own, I don't think so...

    To even suggest to use Flash for gfx IMO validates a combination of web-cameras, pieces of led forced to a somewhat hight velocity by chemical reactions and lots of cheering people.
  • by BenjyD ( 316700 ) on Saturday August 16, 2003 @07:30AM (#6711577)
    The compiler can make some use of multimedia extensions, but it can't exploit them fully. To get the best performance often requires a non-trivial modification of the loop you're optimising, which you can really only get (at the moment) by writing hand optimised assembly.

    I've written MMX versions of algorithms (blending, intensity etc) that are 5 times faster than their C equivalent - I've yet to see that kind of improvement from GCC.
  • by anno1a ( 575426 ) <cyrax.b0rken@dk> on Saturday August 16, 2003 @09:19AM (#6711887) Homepage
    I have an Athlon 1400 and a GeForce2Ultra... While the framerates more or less fits yours:
    1: 1871
    3: 630
    5: 372
    When you look at them it's obvious that they're not running simultaniously, but get a little bit of gfx-time each, stopping and waiting for the other gears to stop moving. Utterly useless!
  • by unfortunateson ( 527551 ) on Saturday August 16, 2003 @10:36AM (#6712184) Journal
    Hmm... probably less relevant to this discussion, but the Apple ][ floppy driver had some other interesting de-optimizations:

    The way I was told the story, Apple was buying lower-quality components than those on more expensive drives, and to compensate, would do each disk operation (like a read) seven times, and vote on the result.

    Several patched drivers came out that merely read 5 or, if you were willing to risk data errors, 3 times. Greatly improved performance.

    Of course, no mention of Apple ][ disks would be complete without the mention that a blank floppy would cause some sort of infinite loop because the directory table couldn't be found. Hence:

    On a clear disk you can seek forever!
  • Re:One word: (Score:4, Interesting)

    by AntiOrganic ( 650691 ) on Saturday August 16, 2003 @11:04AM (#6712295) Homepage
    Many of the drawing functions in the GDI+ graphics library in Windows 2000/XP have already been offloaded to the GPU (ever try moving translucent windows around? It's smooth.). There's really not that much left to be done.

    There are also third-party utilities (Stardock WindowFX, etc.) that create all sorts of nifty transitions, shadows, blending, etc. that are handled by the graphics device.

Think of it! With VLSI we can pack 100 ENIACs in 1 sq. cm.!