Catch up on stories from the past week (and beyond) at the Slashdot story archive

Hardware Based XRender Slower than Software Rendering? 297

Posted by michael on Saturday August 16, 2003 @12:13AM from the unsolved-mysteries dept.

Neon Spiral Injector writes "Rasterman of Enlightenment fame has finally updated the news page of his personal site. It seems that the behind the scenes work for E is coming along. He is investigating rendering backends for Evas. The default backend is a software renderer written by Raster. Trying to gain a little more speed he ported it to the XRender extension, only to find that it became 20-50 times slower on his NVidia card. He has placed some sample code on this same news page for people to try, and see if this is also experienced on other setups."

This discussion has been archived. No new comments can be posted.

Hardware Based XRender Slower than Software Rendering?

Load All Comments

Search 297 Comments Log In/Create an Account

Comments Filter:

are the drivers installed? (Score:3, Funny)

by efishta ( 644037 ) writes: on Saturday August 16, 2003 @12:17AM (#6710387)

last time I checked all graphix cards need drivers to enable their acceleration.

Share
twitter facebook
- - Re:Can you learn to spell? (Score:2, Funny)
    
    by efishta ( 644037 ) writes:
    
    needless to say most english speaking people know that the "x" letter is pronounced the same way as "cs" or "ks"... so I figured it would be redundant to type to letters when I could very easily do half the work and type one letter.
    
    Replying to this post completely negates what I was trying to accomplish with my "x" trick, but I thought you'd like to know.
- - Re:are the drivers installed? (Score:2)
    
    by Fishead ( 658061 ) writes:
    
    To discount this as not being possible is foolish. I am sure I am not the only /.er who has racked his brains trying to find a solution only to have someone (less skilled even) come over and point out the obvious solution. Yes this is an insult, and yes you wanna punch the smart alec in the mouth, but it doesn't mean he is wrong to suggest the solution.
2D acceleration using OpenGL? (Score:5, Interesting)

by gloth ( 180149 ) writes: on Saturday August 16, 2003 @12:18AM (#6710390)

He didn't really get too far into that, but it would be interesting to see how feasible it is to do all the 2D rendering using OpenGL, encapsulated by some layer, like his Evas.

Has anyone done that? Any interesting results? One would think that there's a lot of potential here...

Share
twitter facebook
- One word: (Score:4, Informative)
  
  by i_am_nitrogen ( 524475 ) writes: on Saturday August 16, 2003 @12:30AM (#6710441) Homepage Journal
  
  Irix.
  
  IrisGL or OpenGL (I think OpenGL is based on IrisGL, so Irix probably now uses OpenGL) is used extensively in Irix, for both 2D and 3D.
  
  Parent Share
  twitter facebook
  - Re:One word: (Score:3, Informative)
    
    by Krach42 ( 227798 ) writes:
    
    You forgot about an even more common example... QUARTZ! Apple's OSX does all rendering through Quartz, (as PDFs) which is accelerated by OpenGL, and called QuartzExtreme.
    - Re:One word: (Score:5, Insightful)
      
      by Talez ( 468021 ) writes: on Saturday August 16, 2003 @08:43AM (#6711754)
      
      Also, Microsoft are getting in on the act. The new Desktop Composition Engine for Longhorn is based on the same type of compositing but using DirectX instead of OpenGL.
      
      It's great for using 3D effects on 2D windows for what has normally been wasted horsepower. Finally, eye candy that won't slow down your system!
      
      Parent Share
      twitter facebook
      - Re:One word: (Score:4, Interesting)
        
        by AntiOrganic ( 650691 ) writes: on Saturday August 16, 2003 @11:04AM (#6712295) Homepage
        
        Many of the drawing functions in the GDI+ graphics library in Windows 2000/XP have already been offloaded to the GPU (ever try moving translucent windows around? It's smooth.). There's really not that much left to be done.
        
        There are also third-party utilities (Stardock WindowFX, etc.) that create all sorts of nifty transitions, shadows, blending, etc. that are handled by the graphics device.
        
        Parent Share
        twitter facebook
    - Re:One word: (Score:4, Informative)
      
      by be-fan ( 61476 ) writes: on Saturday August 16, 2003 @01:23PM (#6712920)
      
      Arg. Not again. -5 "dis"informative. Quartz "Extreme" does not accelerate any actual rendering! According to Apple's very on Siggraph presentation, all 2D rendering is still done via software. Only final window *compositing* (doing the alpha blend between all the windows) and window-level effects (like the genie effect) are done via OpenGL.
      
      Parent Share
      twitter facebook
    - Quartz Extreme in a few words (Score:5, Informative)
      
      by The Ego ( 244645 ) writes: on Saturday August 16, 2003 @01:55PM (#6713070)
      
      Apple's OSX does all rendering through Quartz, (as PDFs) which is accelerated by OpenGL, and called QuartzExtreme.
      
      That's not accurate. Quartz is really made of two parts: Quartz 2D and the Quartz Compositor.
      
      The Quartz Compositor is reponsible for compositing all the layers (desktop, windows, layers inside windows) on-screen. It offers Porter-Duff compositing, which was developped at Pixar more than 15 years ago. See this post [google.com] from Mike Paquette for details. Mr Paquette is one of the main developpers of Quartz. Quartz Extreme is "simply" an OpenGL implementation of Porter-Duff compositing and modern graphic cards offer the primitives needed to do that very efficiently.
      
      The Quartz 2D layer is what offers drawing primitives following the Postscript drawing model. The same drawing model is used with PDF (no surprise), Java2D and SVG (and Microsoft's GDI+ ?). This part is not HW accelerated. I am sure Apple is working on it, but it wouldn't surprise me if new HW will be required to make this possible. There is a strong incentive for card manufacturers to offer acceleration, since Longhorn is supposed to use GDI+ extensively. I doubt that such acceleration will fit in the traditionnal OpenGL/Direct3D rendering pipeline.
      
      The Apple JVM team implemented HW accelerated Java2D drawing in their 1.3.1 JVM. Their 1.4 JVM doesn't offer it (1.4.1 was a massive rewrite for them, 1.3.1 was more of a quick port to OS-X using some of their "old" carbon code). There were quite a few problems when HW acceleration was used. I hope they can and will wait for a system-wide Quartz-2D HW acceleration, it seems ludicrous to have the JVM team spend resources on an effort that will be wasted once Quartz2D is accelerated.
      
      See Apple Marketing page [apple.com], another post from Mike Paquette [google.com], and the presentation from Apple at SIGgraph about Quartz Extreme and OpenGL [opengl.org].
      
      If that post doesn't end-up rated +5 informative, I don't know what will ! :-)
      
      Parent Share
      twitter facebook
  - - Re:One word: (Score:3, Informative)
      
      by flynn_nrg ( 266463 ) writes:
      
      The file manager, for example, used resizable icons. Moving a slider would make the icons bigger or smaller. Those were definitely vector graphics. I'm not 100% sure, but I'd bet those were opengl objects.
      
      About grandparent's comment, yes, SGI created IrisGL first, then moved onto OpenGL when they opened up the specs, and had a glue library for compatibility with old apps, called Igloo (IrisGL on OpenGL)
      Btw, I've tried rasterman's test on my ancient Riva TNT card and software rendering is indeed a lot
- Re:2D acceleration using OpenGL? (Score:5, Interesting)
  
  by Animats ( 122034 ) writes: on Saturday August 16, 2003 @01:02AM (#6710554) Homepage
  
  That's technically viable, and I've worked with some widget toolkits for Windows that render everything through OpenGL. On modern graphics hardware, this has good performance. After all, the hardware can draw complex scenes at the full refresh rate; drawing some flat bitmaps through the 3D engine isn't too tough.
  One problem is that multi-window OpenGL doesn't work that well. Game-oriented graphics boards don't have good support for per-window unsynchronized buffer swapping, so you tend to get one window redraw per frame time under Windows. (How well does Linux do with this?) Try running a few OpenGL apps that don't stress the graphics hardware at the same time. Do they slow down?
  One of the neater ways to do graphics is to use Flash for 2D and OpenGL for 3D. Quite a number of games work that way internally. The Flash rendering engine typically isn't Macromedia's, but Macromedia authoring tools are used. This gives the user interface designers great power without having to program.
  
  Parent Share
  twitter facebook
  - Re:2D acceleration using OpenGL? (Score:5, Informative)
    
    by mvdwege ( 243851 ) writes: <mvdwege@mail.com> on Saturday August 16, 2003 @02:31AM (#6710802) Homepage Journal
    On using OpenGL in multiple windows....
    
    How well does Linux do with this?) Try running a few OpenGL apps that don't stress the graphics hardware at the same time. Do they slow down?
    
    While my graphics hardware is not quite representative (the Matrox G450 is not known for great 3D performance), I ran two instances of glxgears.
    
    Short conclusion: MesaGL on Linux has the same problem. Long conclusion: the windows showed noticable slowdowns, up to the point where animation was suspended in one window while the other ran, with the system switching the running window at seemingly random intervals.
    
    System specs:
    
    Athlon 1600XP
    
    MSI K7TPro2 Motherboard
    
    Matrox G450 AGP Graphics Card
    
    Linux kernel 2.6.0-test3
    
    XFree86 4.2.1 (Debian patchlevel 9)
    
    Hope this helps,
    
    Mart
    Parent Share
    twitter facebook
    - Re:2D acceleration using OpenGL? (Score:5, Informative)
      
      by Rufus211 ( 221883 ) writes: <[rufus-slashdot] [at] [hackish.org]> on Saturday August 16, 2003 @03:13AM (#6710935) Homepage
      
      Must be your hardware. I have an Ath 2700 XP with a ATI 9800 running Debian with X 4.3
      
      Single glxgears: 3600
      3 glxgears: 1200
      5 glxgears: 700
      
      (All aprox numbers). So basically it scales almost perfectly with the number of open windows.
      
      Parent Share
      twitter facebook
      - Re:2D acceleration using OpenGL? (Score:3, Informative)
        
        by Mr Z ( 6791 ) writes:
        
        I have a dual Athlon MP 2600 running w/ an nVidia GeForce 4 MX440. Here's what I get for 1 through 4 glxgears:
        
        ~7400 frames / 5 seconds
        
        ~3000 frames / 5 seconds
        
        ~1500 frames / 5 seconds
        
        ~1300 frames / 5 seconds
        
        The fall-off is slightly more harsh than linear for 1 through 3, probably synchronization overhead. 4 seems to get faster in terms of total frame rate across all four instances. 2*3000 == 6000, 3*1500 = 4500, 4*1300 = 5200(!)
        --Joe
      - Re:2D acceleration using OpenGL? (Score:2, Interesting)
        
        by anno1a ( 575426 ) writes:
        
        I have an Athlon 1400 and a GeForce2Ultra... While the framerates more or less fits yours:
        1: 1871
        3: 630
        5: 372
        When you look at them it's obvious that they're not running simultaniously, but get a little bit of gfx-time each, stopping and waiting for the other gears to stop moving. Utterly useless!
      - Re:2D acceleration using OpenGL? (Score:5, Informative)
        
        by Animats ( 122034 ) writes: on Saturday August 16, 2003 @03:01PM (#6713362) Homepage
        
        So basically it scales almost perfectly with the number of open windows.
        Which means it's broken. All the windows should run at full speed until the graphics pipeline saturates.
        There are several problems. First, make sure that you're not running with "wait for VBLANK" off. There's a stupid overclocker mania for running the graphics system faster than the display can refresh. This leads to high, meaningless frame rates, and to lower system performance because the useless redraws are using up all the CPU time.
        Once you're past that, the issues are more fundamental.
        The real problem is that OpenGL is double-buffered, but most windowing systems don't understand double-buffering or frame-synchronous drawing very well. Even OpenGL has no notion of time. But this could be fixed.
        Usually, each app draws into the back buffer, then makes the OpenGL call to swap the buffers. This blocks the app (older NVidia drivers for Windows spin-locked, but I got them to fix that), but worse, it typically locks up the OpenGL subsystem until the frame ends and the buffer swap occurs. Implementations like that can only draw one window per frame time, obviously.
        What ought to happen is that a request for a buffer swap should schedule a buffer swap for the next frame cycle, block the app, then let other apps get in their draw time. At the end of the frame, when everybody is done drawing, all windows get buffer swapped, and all the apps stuck in the OpenGL buffer swap call then unblock simultaneously. That way, multiple OpenGL apps running in different windows all run at full frame rate, until the scene complexity hits the limits of the graphics hardware.
        Part of the problem is that X and OpenGL are such drastically different architectures that making them play well together is tough. X assumes a network-centric model; OpenGL assumes you're local. X expects a weak terminal; OpenGL needs good graphics acceleration. X is built around a windowing concept; OpenGL doesn't know about windows. X and OpenGL are defined by different organizations.
        Microsoft is pulling this together in the Windows world, but it's all done with Microsoft APIs, and, recently, undocumented hardware that favors those APIs.
        
        Parent Share
        twitter facebook
    - Re: 2D acceleration using OpenGL? (Score:2)
      
      by Black Parrot ( 19622 ) writes:
      
      > While my graphics hardware is not quite representative (the Matrox G450 is not known for great 3D performance), I ran two instances of glxgears. Short conclusion: MesaGL on Linux has the same problem. Long conclusion: the windows showed noticable slowdowns, up to the point where animation was suspended in one window while the other ran, with the system switching the running window at seemingly random intervals.
      
      That's interesting. I also have a Matrox G450 AGP card, but running two instances of gears
    - Re:2D acceleration using OpenGL? (Score:2)
      
      by noselasd ( 594905 ) writes:
      
      Err.. MesaGL is only a software render. Naturally it's slow. Well, it probably does accerlate some Matrox card, bit that's it. Running 2 glxgears here gives me a fps of about 750 each , while only on yields 1650.(geforce4 , nvidia drivers)
    - - oh yeah, har de har har (Score:2)
        
        by leonbrooks ( 8043 ) writes:
        
        -1 Weak joke?
  - Re:2D acceleration using OpenGL? (Score:2, Interesting)
    
    by Anonymous Coward writes:
    
    On modern graphics hardware, this has good performance.
    
    Exactly, on modern hardware. If you have anything less than ATI 9700, NV "hottest and greatest" or the only third (but really expensive, why I won't name them) vendor OpenGL-supporting 3D cards, you are indeed screwed. Bigtime!
    
    To explain what I'm talking about here: On a 2D card you can easily often move a full screen/vblank (OK, not true for PCI cards if you're like me using 1600x1200x32 ~= 7.32MB/frame, and at just 75Hz that's 549MB/second).
    
    That's
  - Re:2D acceleration using OpenGL? (Score:2)
    
    by crawling_chaos ( 23007 ) writes:
    
    Tell me about it. The Neverwinter Nights toolset uses multiple OpenGL windows and it crashes -- a lot. Evidently it's stable enough if you're running the exact GeForce cards that the devs use, but anything else and it can be troublesome.
- Re:2D acceleration using OpenGL? (Score:4, Informative)
  
  by Rabid Penguin ( 17580 ) writes: on Saturday August 16, 2003 @01:16AM (#6710597) Homepage
  
  Yes, and yes. :-)
  
  The current version of Evas is actually the second iteration. The first version had a backend written for OpenGL, which performed quite well for large drawing areas, but was sluggish with many small areas (bad for window managers). The software engine easily outperformed in those cases, and will be used for the resulting window manager's border drawing.
  
  For now, there is not an OpenGL engine in Evas, because of time constraints. E has a relatively small active development team atm, so it's difficult to say when someone will get around to adding the OpenGL engine. There should be one eventually, all nicely encapsulated except for a couple setup functions.
  
  Parent Share
  twitter facebook
  - Re:2D acceleration using OpenGL? (Score:2)
    
    by Xerithane ( 13482 ) writes:
    
    You seem to be in the know, so I'll ask my offtopic question to you, hope you don't mind. Is it possible to run an OpenGL screen in the background (root window) and then have X windows on top of it? I mean, "Run smoothly on decent (not modern) hardware?"
- Re:2D acceleration using OpenGL? (Score:3, Insightful)
  
  by harlows_monkeys ( 106428 ) writes:
  
  ...it would be interesting to see how feasible it is to do all the 2D rendering using OpenGL
  Isn't that what Blender does? They implement their GUI using OpenGL, drawing all the widgets themselves, so that their interface is the same on all platforms they are ported to.
The damndest thing. (Score:5, Informative)

by Raven42rac ( 448205 ) * writes: on Saturday August 16, 2003 @12:20AM (#6710398)

I have used both ATI and NVIDIA,(and 3dfx, and matrox, but staying relevant). Generally the NVIDIA cards I have owned have been vastly outperformed by the ATI cards right off the bat, without tweakage. (This is under Linux, mind you) Even with tweakage, in my experience, you rarely get the full potential from your card.

Share
twitter facebook
accelerated? (Score:4, Interesting)

by Spy Hunter ( 317220 ) writes: on Saturday August 16, 2003 @12:23AM (#6710409) Journal

Is XRender really accelerated? I thought that most Render operations were still unaccelerated on most video cards, and how and if they could be accelerated was still an open question. Maybe the real problem here is Render's software rendering code?

Share
twitter facebook
- Re:accelerated? (Score:3, Interesting)
  
  by saikatguha266 ( 688325 ) writes:
  
  The NVidia drivers say something about Render Accleration as someone already pointed out. However, there is definitely some glitch somewhere. I tried the benckmark with the RenderAccel both turned off and on on my GeForce 3 with the 4496 drivers and perceived no significant difference in the tests except for test 1. (11s for no accel, 2.5s for accel, 0.62 for imlib2). The rest of the tests sucked for the driver (11s, 215s, 183s, 356s for tests 2 to 5 -- both with and without render accel as opposed to 0.21s
  - Re:accelerated? (Score:2)
    
    by whereiswaldo ( 459052 ) writes:
    
    Obviously, something is wrong here. Hardware rendering should always be faster than software rendering, if the hardware is being used properly.
    
    In the stuff I've done, I'd guess a factor of 4 increase in speed at least.
- Re:accelerated? (Score:5, Informative)
  
  by Spy Hunter ( 317220 ) writes: on Saturday August 16, 2003 @02:41AM (#6710838) Journal
  Well I ran Renderman's benchmark on my Radeon 9100/Athlon XP 2800 system, and here are the results (edited for lameness filter):
  
  *** ROUND 1 *** Test: Test Xrender doing non-scaled Over blends Time: 15.925 sec. --- Test: Test Xrender (offscreen) doing non-scaled Over blends Time: 15.927 sec. --- Test: Test Imlib2 doing non-scaled Over blends Time: 0.321 sec. *** ROUND 2 *** Test: Test Xrender doing 1/2 scaled Over blends Time: 7.125 sec. --- Test: Test Xrender (offscreen) doing 1/2 scaled Over blends Time: 7.134 sec. --- Test: Test Imlib2 doing 1/2 scaled Over blends Time: 0.133 sec. *** ROUND 3 *** Test: Test Xrender doing 2* smooth scaled Over blends Time: 131.495 sec. --- Test: Test Xrender (offscreen) doing 2* smooth scaled Over blends Time: 131.703 sec. --- Test: Test Imlib2 doing 2* smooth scaled Over blends Time: 2.487 sec. *** ROUND 4 *** Test: Test Xrender doing 2* nearest scaled Over blends Time: 113.890 sec. --- Test: Test Xrender (offscreen) doing 2* nearest scaled Over blends Time: 113.945 sec. --- Test: Test Imlib2 doing 2* nearest scaled Over blends Time: 1.778 sec. *** ROUND 6 *** Test: Test Xrender doing general nearest scaled Over blends Time: 197.817 sec. --- Test: Test Xrender (offscreen) doing general nearest scaled Over blends Time: 197.800 sec. --- Test: Test Imlib2 doing general nearest scaled Over blends Time: 5.171 sec. *** ROUND 7 *** Test: Test Xrender doing general smooth scaled Over blends Time: 268.509 sec. --- Test: Test Xrender (offscreen) doing general smooth scaled Over blends Time: 268.656 sec. --- Test: Test Imlib2 doing general smooth scaled Over blends Time: 7.507 sec.
  
  Obviously XRender is getting crushed here by Imlib2. There are a million reasons this might be happening, it's definitely worth looking into. In the best Slashdot tradition, here's some wild speculation about what might be causing the slowdown:
  
  Renderman's code might be giving an unfair advantage to Imlib2. The Imlib2 results are never shown on the screen. However, XRender is tested both with display and without, so it seems like it should be fair.
  
  Renderman's code might be using XRender in an inefficient way. I'm no X programming expert so I have no idea if what he's doing is the best way to do it, but Rasterman is supposed to be some sort of expert in producing nice fast graphics on X so I'd say this is unlikely.
  
  XRender might not be hardware accelerated for some reason, probably having to do with driver configuration or something. But geez, does the software rendering have to be that slow? Maybe they could learn something from Imlib2.
  
  The hotly debated "X protocol overhead" might be causing this slowdown. But given the magnitude of the slowdown, I think this is unlikely.
  
  Hopefully someone knowledgeable like Keith Packard himself will come and enlighten us with the true cause.
  Parent Share
  twitter facebook
  - Re:accelerated? (Score:2, Insightful)
    
    by cduffy ( 652 ) writes:
    
    ...Rasterman is supposed to be some sort of expert in producing nice fast graphics on X so I'd say this is unlikely.
    
    I'm not so sure of that. Rasterman may have been the first person to write a window manager with quite that much eye candy -- but god, man, have you seen his code?
    
    I can't speak for anything he's written within the last 18 months or so (maybe it's been longer now?), but last time I looked at his C it was ugly, unportable sh*t.
    
    That said, I'll be curious to see what the XRender folks have to
    - Re:accelerated? (Score:2)
      
      by SmallFurryCreature ( 593017 ) writes:
      
      mmm, and since when does ugly and unportable say anything about speed? I presume here with ugly you mean lots of hags, lousy variable names and incosistent layout style stuff.
      None of this has anything to do with efficiency. You sound like those people who say a clean desk is better then a cluttered one. It isn't it is just a reflection on style, not on work efficiency.
      Of course you may be using ugly as in badly coded in wich case I hang my head in shame.
      - Re:accelerated? (Score:3, Insightful)
        
        by cduffy ( 652 ) writes:
        
        I mean unportable as in making assumptions about data types which are only valid on the architecture the developer is currently using. I mean ugly as in failing to initialize all his data. Even the very latest build of Evolution out of Debian unstable isn't even capable of running under valgrind, and a friend of mine -- one of the MIPS kernel folks and an *extremely* skilled hacker -- who was at one point working on a portable version of E simply threw up his hands in disgust.
        
        So no, I'm not talking about a
- Re:accelerated? (Score:4, Informative)
  
  by Doug Neal ( 195160 ) writes: on Saturday August 16, 2003 @04:00AM (#6711083)
  
  The NVidia drivers provide experimental RENDER acceleration. I tried it out recently on my laptop and it's not called experimental for nothing - it's rather unstable. Every so often XFree86 would lock up. Mouse cursor freezing, nothing moving etc. The kernel would still be fine so I could ssh in and kill X, which would be running at 99% CPU usage.
  
  Certain things seemed to trigger it, e.g. loading up OpenOffice would guarantee a lock-up.
  
  So yes, hardware RENDER acceleration isn't really there at the moment. I expect this has something to do with the poor results the Rasterman got.
  
  Parent Share
  twitter facebook
duh (Score:3, Interesting)

by SHEENmaster ( 581283 ) writes: <travis@u t k . edu> on Saturday August 16, 2003 @12:29AM (#6710437) Homepage Journal

graphics cards work quickly because they cut every corner that can possibly be cut. It makes sense that they would run computer software slower.

I'm more interested in using them for specific calculations. Imagine if one of these things was accidentally embued with the ability to factor gigantic numbers. The AGP slot is just an excuse to keep us from beowulfing them over PCI-X

Share
twitter facebook
- Graphics cards and computation (Score:5, Interesting)
  
  by Amit J. Patel ( 14049 ) writes: <amitp@cs.stanford.edu> on Saturday August 16, 2003 @12:48AM (#6710504) Homepage Journal
  
  There has been some work on using graphics cards for computation [att.com]. The tough part is figuring out how to rephrase your algorithm in terms of what the GPU can handle. You'd expect matrix math [cs.sfu.ca] to work out but people have tried to implement more interesting algorithms too. :-)
  - Amit [stanford.edu]
  
  Parent Share
  twitter facebook
  - Re:Graphics cards and computation (Score:2)
    
    by cybermace5 ( 446439 ) writes:
    
    Back in school, a lot of discussion was being thrown around about using video cards to process the RC5 crack. Only thing was the processor may not be all that much faster than a computer processor. It would have depended on how close the graphic optimizations were to the code-crack algorithm.
  - Re: Graphics cards and computation (Score:5, Informative)
    
    by Black Parrot ( 19622 ) writes: on Saturday August 16, 2003 @01:45AM (#6710662)
    
    > There has been some work on using graphics cards for computation. The tough part is figuring out how to rephrase your algorithm in terms of what the GPU can handle.
    
    Isn't there a lot of sloth involved in reading your results back as well?
    
    Meanwhile, users of GCC can exploit whatever multimedia SIMD instructions their processor supports by telling the processor you want to use them. For x86 see this [gnu.org] and this [gnu.org]; for other architectures start here [gnu.org]. (Notice the GCC version in the URL; the supported options sometimes change between versions, so you should look in a version of the GCC Manual that matches what you're actually using.)
    
    I confess I haven't benchmarked these options, but in theory they should boost the performance of some kinds of number-crunching algorithms.
    
    BTW, Linuxers can find what multimedia extensions their CPU supports with cat /proc/cpuinfo, even from a user account. Look for multimedia support in the list at the end of the cpuinfo. Lots of those extensions only support integers or low-resolution fp numbers, but IIRC SSE2 should be good for high-precision FP operations. Use google to find out what your extensions are good for.
    
    And post us back if you do some benchmarking, or find some good ones on the Web.
    
    Parent Share
    twitter facebook
    - Re: Graphics cards and computation (Score:2)
      
      by Piquan ( 49943 ) writes:
      
      Meanwhile, users of GCC can exploit whatever multimedia SIMD instructions their processor supports
      Yeah, I've always wondered about that. About how well it explots them, for example, and what GCC needs to recognize a SIMD-acceleratable bit of code. (I also wonder if it puts NaN and Inf traps around 3Dnow! instructions, or if compiling with 3Dnow! means you lose IEEE floating point.)
      Anybody have links to this kind of info?
    - Re: Graphics cards and computation (Score:3, Interesting)
      
      by BenjyD ( 316700 ) writes:
      
      The compiler can make some use of multimedia extensions, but it can't exploit them fully. To get the best performance often requires a non-trivial modification of the loop you're optimising, which you can really only get (at the moment) by writing hand optimised assembly.
      
      I've written MMX versions of algorithms (blending, intensity etc) that are 5 times faster than their C equivalent - I've yet to see that kind of improvement from GCC.
Not enough details (Score:5, Informative)

by bobtodd ( 189451 ) writes: on Saturday August 16, 2003 @12:31AM (#6710447)

Raster doesn't say whther he had 'Option "RenderAccel" "True"' enabled, which you must do on Nvidia cards if you want XRender acceleration.

Here is the entry from the driver README:

Option "RenderAccel" "boolean" Enable or disable hardware acceleration of the RENDER extension. THIS OPTION IS EXPERIMENTAL. ENABLE IT AT YOUR OWN RISK. There is no correctness test suite for the RENDER extension so NVIDIA can not verify that RENDER acceleration works correctly. Default: hardware acceleration of the RENDER extension is disabled.

Following that option, this one is noted:

Option "NoRenderExtension" "boolean" Disable the RENDER extension. Other than recompiling the X-server, XFree86 doesn't seem to have another way of disabling this. Fortunatly, we can control this from the driver so we export this option. This is useful in depth 8 where RENDER would normally steal most of the default colormap. Default: RENDER is offered when possible.

Share
twitter facebook
- Re:Not enough details (Score:4, Interesting)
  
  by madmarcel ( 610409 ) writes: on Saturday August 16, 2003 @12:40AM (#6710478)
  
  When I enabled that setting on my linux box (redhat , latest version of X and a nvidia geforce 4200)
  I got weird glitches all over the screen, most notably in the window borders and wherever windows or menu's overlapped other things on the screen. There was an increase in speed however. As you might expect I disabled it after about 15 minutes. Ugh. I'll have another look at it when it's been fixed :D
  
  Parent Share
  twitter facebook
  - Re:Not enough details (Score:4, Informative)
    
    by BenjyD ( 316700 ) writes: on Saturday August 16, 2003 @04:18AM (#6711136)
    
    It appears to be fixed in 4496, the latest version of the drivers. 4363 would crash every few minutes or so, but 4496 is very stable. Still slower than 3123 for 2D stuff though.
    
    Parent Share
    twitter facebook
- Re:Not enough details (Score:2)
  
  by molarmass192 ( 608071 ) writes:
  
  Has nVIDIA worked the kinks out of this yet? I remember some bad mojo about this option with OpenOffice that makes me hesistant to re-enable it. I'm still on the 4363 release of the drivers, haven't installed the 4496 ones yet.
- I ran the benchmark with RenderAccel true (Score:5, Informative)
  
  by Sits ( 117492 ) writes: on Saturday August 16, 2003 @05:31AM (#6711314) Homepage Journal
  
  And the results were pretty much the same. Using render was several magnitudes slower on tests 2 - 7. I have a GeForce1 with 1.0.4349 nvidia driver and haven't had the same trouble others have with this option on so I run with this extension on all the time.
  
  Here are the results for the interested:
  
  Available XRENDER filters:
  nearest
  bilinear
  fast
  good
  best
  Set up...
  *** ROUND 1 ***
  
  Test: Test Xrender doing non-scaled Over blends Time: 0.190 sec.
  
  Test: Test Xrender (offscreen) doing non-scaled Over blends Time: 0.303 sec.
  
  Test: Test Imlib2 doing non-scaled Over blends Time: 0.697 sec.
  
  *** ROUND 2 ***
  
  Test: Test Xrender doing 1/2 scaled Over blends Time: 10.347 sec.
  
  Test: Test Xrender (offscreen) doing 1/2 scaled Over blends Time: 10.231 sec.
  
  Test: Test Imlib2 doing 1/2 scaled Over blends Time: 0.315 sec.
  
  *** ROUND 3 ***
  
  Test: Test Xrender doing 2* smooth scaled Over blends Time: 207.028 sec.
  
  Test: Test Xrender (offscreen) doing 2* smooth scaled Over blends Time: 205.275 sec.
  
  Test: Test Imlib2 doing 2* smooth scaled Over blends Time: 5.695 sec.
  
  *** ROUND 4 ***
  
  Test: Test Xrender doing 2* nearest scaled Over blends Time: 164.460 sec.
  
  Test: Test Xrender (offscreen) doing 2* nearest scaled Over blends Time: 166.281 sec.
  
  Test: Test Imlib2 doing 2* nearest scaled Over blends Time: 4.119 sec.
  
  *** ROUND 6 ***
  
  Test: Test Xrender doing general nearest scaled Over blends Time: 313.187 sec.
  
  Test: Test Xrender (offscreen) doing general nearest scaled Over blends Time: 310.261 sec.
  
  Test: Test Imlib2 doing general nearest scaled Over blends Time: 11.444 sec.
  
  *** ROUND 7 ***
  
  Test: Test Xrender doing general smooth scaled Over blends Time: 477.511 sec.
  
  Test: Test Xrender (offscreen) doing general smooth scaled Over blends Time: 474.695 sec.
  
  Test: Test Imlib2 doing general smooth scaled Over blends Time: 17.290 sec.
  
  (reformatted to get past the lameness filter)
  
  Parent Share
  twitter facebook
An important truth about X (Score:5, Funny)

by frovingslosh ( 582462 ) writes: on Saturday August 16, 2003 @12:35AM (#6710464)

It may be big and bloated, but at least it's slow.

Share
twitter facebook
- Re:An important truth about X (Score:3, Informative)
  
  by OrangeTide ( 124937 ) writes:
  
  X is small and fast(at least XFree86 [xfree.org] is). When you look at how much virtual memory it has mapped in. (using 'ps' for example). You also are seeing the amount of memory mapped in for the video frame buffer. Have a 32Mb video card? Well at *least* 32Mb of your virtual address space isn't mapping into system ram, it's mapped into video ram.
  
  Also, with any application, the code space doesn't take system RAM in the same sense as data space does. Normally you map in pages of memory that point straight to the I/O
  - 3x for Banshee (Score:2)
    
    by leonbrooks ( 8043 ) writes:
    
    Have a 32Mb video card? Well at *least* 32Mb of your virtual address space isn't mapping into system ram, it's mapped into video ram.
    
    The 16MB Banshee EvilQueen sitting across the room maps three copies of its 16MB into main RAM (so 48MB total, plus maybe another 4MB for a busy X server); apparently each copy is mapped in a different way optimised for different ops.
  - - - Re:An important truth about X (Score:2)
        
        by FooBarWidget ( 556006 ) writes:
        
        Even Microsoft(r) Windows(tm) XP keeps sending expose events (or the Win32 equivalent of expose events) to an app when one of it's windows is unobscured.
        People always praise how fast WinXP is. The X way is no different than the Win32 way. Yet somehow you still say X is slow?
        
        (I'll now wait what the hundreds of rabid anti-XFree zealots have to say)
is this the man who said that "Windows has won"? (Score:3, Insightful)

by rxed ( 634882 ) writes: on Saturday August 16, 2003 @12:38AM (#6710476)

Is this the same person who some time ago said that: "Windows has won. Face it. The market is not driven by a technically superior kernel, or an OS that avoids its crashes a few times a day. Users don't (mostly) care. They just reboot and get on with it. They want apps. If the apps they want and like aren't there, it's a lose-lose. Windows has the apps. Linux does not. Its life on the desktop is limited to nice areas (video production, though Mac is very strong and with a UNIX core now will probably end up ruling the roost). The only place you are likely to see Linux is the embedded space." Slashdot article is also available here: http://slashdot.org/articles/02/07/20/1342205.shtm l?tid=106

Share
twitter facebook
- Thats a myth. (Score:2, Insightful)
  
  by HanzoSan ( 251665 ) writes:
  
  What Apps can I not run under Linux?
  
  My browser works, most of my games work, Photoshop works, Microsoft word works,
  
  Do your research, Wine, Transgaming, Crossoveroffice
  - Re:Thats a myth. (Score:2)
    
    by Xerithane ( 13482 ) writes:
    
    Dumbass, he was quoting Rasterman in a previous Slashdot story. He wasn't saying that.
  - - - Re:Thats why theres Lindows with ClickNRun (Score:2)
        
        by arkanes ( 521690 ) writes:
        
        How do you know? You dont even use Linux.
        I don't? Damn. And here I thought I used it every day....
        Here's a news flash: People don't upgrade because they got a worm. People often don't upgrade at all! How about that. They've ALREADY spent the money on Windows, so it's a pretty steep uphill battle to get them to switch, especially since Lindows isn't any cheaper.
        Here's the best counterargument to what you're saying: Linux is dead on the desktop. It's market share is so small as to be a statistical abberat
- Re:is this the man who said that "Windows has won" (Score:2)
  
  by incom ( 570967 ) writes:
  
  Sounds like he has investments, or some other finiancial interests, in embedded linux. It really isn't realistic to say desktop linux is over at a time when it's never been so popular. Maybe "desktop linux profits" aren't so hot, but linux wasn't designed to make money with anyways. And maybe "desktop linux as #1 popular desktop" isn't seeming very likely either. But I see no reason why whithin the next few years we can't get a decent amount of application, games, and hardware support for linux. I'm doing
  - Re:is this the man who said that "Windows has won" (Score:2)
    
    by ultrabot ( 200914 ) writes:
    
    Sounds like he has investments, or some other finiancial interests, in embedded linux.
    
    Indeed.
    
    Maybe "desktop linux profits" aren't so hot, but linux wasn't designed to make money with anyways.
    
    On the contrary, I allege that lots of money are going to be made on Linux desktops. Support, mass deployment, customization, life cycle extension... so the money will be made in corporate space, and that's the way it should be. That's where the money in Linux servers is made at the moment.
    
    Also, as an nvidia car
  - Re:is this the man who said that "Windows has won" (Score:3, Informative)
    
    by Simon Kongshoj ( 581494 ) writes:
    
    He said that over a year ago, however, when desktop Linux wasn't looking so hot. A large part of his point was that the desktop itself would be going away in the future, except as hackers' and enthusiasts' systems. In fact, he went on to state that if this is the case, Linux has a huge advantage over Windows, since Linux is not nearly as tied into the desktop as Windows is, and will have an easier time adapting to such a setting. So he ported his canvas library to run on embedded as well, without axing it f
- WRONG! Crashing was why I switched to Linux (Score:2)
  
  by spineboy ( 22918 ) writes:
  
  I got tired of Win 98 crashing several times a day - even after vanilla installs. I started looking for another solution back in 1999. Macs were too expensive for a student budget. BeOS was just about to be bought, when I found out about a FREE OS that supposedly didn't crash. I wound up using Linux for most stuff and kept 98 around to play GAMES!.. Now that games are available on Linux, I boot into Win98 about 1x a month for some odd/nostalgic reason.
  I'm happy I made the jump, but for most people t
- - Re:is this the man who said that "Windows has won" (Score:2)
    
    by ndogg ( 158021 ) writes:
    
    Not only does he want a good wm and some good libraries, he want's those libraries to be portable to embedded devices, epsecially since that's where he thinks part of the future for Linux is. The drawing library, Evas, has been ported to a number of devices.
Comment removed (Score:4, Funny)

by account_deleted ( 4530225 ) writes: on Saturday August 16, 2003 @12:49AM (#6710512)

Comment removed based on user account deletion

Share
twitter facebook
- - Re: (Score:2)
    
    by account_deleted ( 4530225 ) writes:
    
    Comment removed based on user account deletion
Raster's on holiday (Score:5, Informative)

by Rabid Penguin ( 17580 ) writes: on Saturday August 16, 2003 @12:58AM (#6710532) Homepage

Normally, he would answer some questions or comments posted about something he has written, but he will be out of town for at least a few days.

I highly doubt he meant for this to get wide-spread exposure beyond developers of Enlightenment or X. Since it has, this is a good opportunity. I'll make this clear for anyone that didn't catch it, raster WANTS XRENDER TO BE FASTER! If there is a way to alter configuration or to recode the benchmark to do so, he wants to know about it.

Rather than posting questions about his configuration (which he can't answer right now), grab the benchmarks that he put up and get better results.

Now back to your regularly scheduled trolling...

Share
twitter facebook
Lessons from the ancient (Score:4, Interesting)

by Empiric ( 675968 ) * writes: on Saturday August 16, 2003 @01:00AM (#6710542)

There's an example from back in the 80's that still probably serves as a good engineering reference for people working on hardware/software driver issues.

In those days of yore (only in the computer industry can one refer to something 20 years ago as "yore"...) there was the Commodore 64. It retains it's place as a pioneering home computer in that it offered very good (for the time) graphics and sound capability, and an amazing 64K of RAM, in an inexpensive unit. But then came its bastard son...

The 1541 floppy disk drive. It became the storage option for a home user once they became infuriated enough with the capabilites of cassette-tape backup to pony up for storage on a real medium. Unfortunately, the 1541 was slow. Unbelievably slow. Slow enough to think, just maybe, there were little dwarven people in your serial interface cable running your bits back and forth by hand.

Now, a very unique attribute of the 1541 drive was that it had its own 6502 processor and firmware. Plausibly, having in effect a "disk-drive-coprocessor" would accelerate your data transfer. It did not. Not remotely. Running through a disassembly of the 6502 firmware revealed endless, meandering code to provide what would appear, on the surface, to be a pretty straightforward piece of functionality: send data bits over the data pin and handshake it over the handshake signal pin.

As the market forces of installed base and demand for faster speed imposed themselves, solutions to the 1541 speed problem were found by third party companies. Software was released which performed such functions as loading from disk and backing up floppies as speeds that were many, many times faster than the 1541's base hardware and firmware could offer.

The top of this particular speed-enhancement heap was a nice strategy involving utilizing both the Commodore 64's and the 1541's processors, and the serial connection, optimally. Literally optimally. Assembly routines were written to run on the both 64 and the 1541 side to exactly synchronize the sending and receiving of bits on a clock-cycle by clock-cycle basis. Taking advantage of the fact both 6502's were running at 1 Mhz, the 1541's code would start blasting the data across the serial line to the corresponding 64 code, which would pull it off the serial bus within a 3-clock-cycle window (you could not write the two routines to be any more in sync than a couple 6502 instructions). This method used no handshaking whatsoever for large blocks of data being sent from the drive to the computer, and so, in an added speed coup, the handshaking line was also used for data, doubling the effective speed.

The 1541 still seems pertinent as an example of a computer function that one would probably think would best be done primarily on a software level (running on the Commodore 64), but was engineered instead to utilize a more-hardware approach (on the 1541), only to be rescued by better software to utilize the hardware (on both).

There's probably still a few design lessons from the "ancient" 1541, for both the hardware and the software guys.

Share
twitter facebook
- Re:Lessons from the ancient (Score:2, Interesting)
  
  by red floyd ( 220712 ) writes:
  
  The other classic example was the original PC-AT MFM controller.
  
  IIRC, they originally tried (slave mode -- the only available thing then) DMA, and in general, it was faster to pump the data out by hand.
- Re:Lessons from the ancient (Score:5, Insightful)
  
  by The Vulture ( 248871 ) writes: on Saturday August 16, 2003 @01:23AM (#6710607) Homepage
  
  The 1541 drive itself was actually quite fast, reading an entire sector in much less than a second (if you set the job code directly in the drive). It was the serial transfer that was super slow (as you stated).
  
  Unfortunately, the fast loaders assuming that the CPU and the drive both ran at exactly the same speed was a cause for problems. The PAL version of the C64 ran at a different speed (a bit slower, I believe), thus making fast loaders either NTSC or PAL specific (although there may have been one or two that could actually take the clock speed into consideration). The same fault meant that fast loaders sometimes didn't work with some variants of the drives (different CPU's, all supposedly 6502 compatible, but not necessarily so).
  
  Additionally, because these fast loaders required exact timing, something had to be done with the VIC-II (interrupts from it would cause the 6510 in the C64 to lose it's timing) - usually the screen was blanked (basically turning off the VIC-II), or at the least, turning off sprites (sprites by the way, while nice, were a PITA becuase they disrupted everything, including raster timing).
  
  Commodore did screw things up... They had four (or was it six?) connectors on each end of the cable, they could have made it at least quai-parallel, rather than the serial with handshaking. Unfortunately, they only hooked up two, CLK (handshaking clock) and DATA (for the data bit). However, seeing as the 1541 was the same hardware mechanism as the 1540 (it's predecessor for the VIC-20) and contained most of the same software (you could use a "user" command to change the speed for the VIC-20), they couldn't just go out and change the design. I almost get the feeling that they took the serial bus from the VIC-20, put it in the C64, figuring that they'd be able to use the 1540 drive. Then at the last minute, they realized that it wouldn't work and they made the 1541, as well as a ROM upgrade for the 1540 to work with the C64.
  
  While getting rid of the handshaking and transferring an extra bit over that line made sense then, with modern computers, I wouldn't trust it. There's too many components from too many manufacturers, and I really like my MP3 and pr0n collections too much to lose them to one bit being corrupted.
  
  -- Joe
  
  Parent Share
  twitter facebook
- Re:Lessons from the ancient (Score:3, Interesting)
  
  by unfortunateson ( 527551 ) writes:
  
  Hmm... probably less relevant to this discussion, but the Apple ][ floppy driver had some other interesting de-optimizations:
  
  The way I was told the story, Apple was buying lower-quality components than those on more expensive drives, and to compensate, would do each disk operation (like a read) seven times, and vote on the result.
  
  Several patched drivers came out that merely read 5 or, if you were willing to risk data errors, 3 times. Greatly improved performance.
  
  Of course, no mention of Apple ][ disks w
- - Re:Lessons from the ancient (Score:2, Funny)
    
    by Alien Being ( 18488 ) writes:
    
    It's just like my uncle Fortranna Rosanadanna used to tell me. It's always something. Either your XRENDER is slow or you're stuck with a two bit disk drive.
    
    Back to you, Jane.
    
    -Rosanna
    - Re:Lessons from the ancient (Score:2)
      
      by Trogre ( 513942 ) writes:
      
      It's always something. Either your XRENDER is slow or you're stuck with a two bit disk drive.
      
      ... or stuck with a two-bit software vendor [windowsupdate.com].
Unfair comparison (Score:3, Informative)

by Anonymous Coward writes: on Saturday August 16, 2003 @01:01AM (#6710548)

The numbers being reported for this benchmark are at best questionable--yeah, like that's new. The imlib image is composed off-screen and then rendered at the last moment to the display. The Xrender, non-off screen, version has the penalty of having to upgrade the physical display so frequently. If you make imlib2 render the image to the screen *every* draw, you end up getting results very similar to the Xrender on-screen display. Now, the fact that the Xrender off-screen display is so poor *is* a concern.

Share
twitter facebook
- Re:Unfair comparison (Score:2)
  
  by Yokaze ( 70883 ) writes:
  
  Yes, this might be unfair, but consider the first test.
  Simply blending the images and displaying them is faster than the off-screen variant and the imlib2 code.
  
  So, the possible penalty seems to be irrelevant.
  Also, the "overhead" of X seems to be unrelevant, too.
  
  I'd say that the scaling and filtering implemented in either X or the drivers is suboptimal.
nVidia Linux woes (Score:4, Informative)

by bleachboy ( 156070 ) writes: on Saturday August 16, 2003 @01:06AM (#6710570)
I have an nVidia GeForce2 Ultra, and recently upgraded my kernel to 2.5.75. It caused my X graphics to become unbelievably slow -- like 2400 baud modem slow when doing a directory listing or anything where text was scrolling. Downgrading to 2.4.21-ac4 (ac4 needed for some Adaptec drivers) and it was back to fast again. Further, my favorite 3D shooter was about 60 fps faster with the 2.4 kernel. The kernels were compiled identically, or at least as identically as you can get with 2.4 vs 2.5. Here's a few tips I can offer to the nVidia users out there:
- In case you don't know, nVidia provides official (but woefully non-GPL) drivers [nvidia.com]. They also have a message board [nvnews.net] which I found to be quite informative at times.
- Compile your kernel with MTRR support. It will speed things up a great deal.
- Compile your kernel without AGPGART support. The nVidia driver(s) are faster.
- If you want to try the nVidia driver with a 2.5 kernel, you'll need a patch [minion.de].
- If you have an nForce chipset, make sure to add "mem=nopentium" to your kernel boot parameters, or else your system will be incredibly unstable. Better yet, ditch your nForce chipset (I did) since the Linux support totally blows, at least for now. Give your old nForce chipset to your wife, girlfriend, mother, Windows box, or whatever.
Share
twitter facebook
- Re:nVidia Linux woes (Score:2, Redundant)
  
  by Zaffle ( 13798 ) * writes:
  
  * If you have an nForce chipset, make sure to add "mem=nopentium" to your kernel boot parameters, or else your system will be incredibly unstable. Better yet, ditch your nForce chipset (I did) since the Linux support totally blows, at least for now. Give your old nForce chipset to your wife, girlfriend, mother, Windows box, or whatever.
  
  Oh how I agree with that statement.
  Recently my motherboard died of bad caps, so decided to splash out on an nvidia motherboard. Damn thing had crap all linux support. I
- Re:nVidia Linux woes (Score:2)
  
  by crimsun ( 4771 ) writes:
  
  You shouldn't have to append "mem=nopentium" if you use Linux 2.4.19 or newer (all 2.6.0* as well). This is what the Nvidia driver engineers say, at least...
- Re:nVidia Linux woes (Score:2)
  
  by shellbeach ( 610559 ) writes:
  
  I have an nVidia GeForce2 Ultra, and recently upgraded my kernel to 2.5.75. It caused my X graphics to become unbelievably slow -- like 2400 baud modem slow when doing a directory listing or anything where text was scrolling.
  That's wierd - I haven't noticed any lag in performance using my nVidia GF4 440 MX card under 2.5.73 ...
Well, yes (Score:3, Interesting)

by reynaert ( 264437 ) writes: on Saturday August 16, 2003 @01:07AM (#6710572)

As far as I know, only the Matrox G400 card has good hardware render accelaration. NVidia's support is still experimental and rather poor. Render is still considered experimental, and speed is not yet considered to be very important. Full accelerated support is planned for XFree86 5.

Share
twitter facebook
- similar experience (Score:2)
  
  by sporkboy ( 22212 ) writes:
  
  This reminds me of the experience of WindowFX, a 3d transparency/animation tool made by Stardock. They included hardware 'acceleration' as a settable option, but for most cards it was anything but an option, ran at 1fps.
  
  The exception being the G400, then the Radeon, and only very recently (on Windows) the GeForce. It's entirely an issue of how well the drivers are implemented, and since many of these 2d acceleration functions aren't widely used they're often overlooked in favor of the (traditionally) co
It takes time to talk to hardware (Score:4, Interesting)

by garyebickford ( 222422 ) writes: <gar37bic@g[ ]l.com ['mai' in gap]> on Saturday August 16, 2003 @01:27AM (#6710617)

I worked on 2D & 3D libs a while back for a graphics company. Among the biggest problems at the time was that each different output device had its own feature set, implemented slightly differently. Every designer had their own ideas of what would be 'cool' in their graphics engine, which tended to follow the latest progress in the field.

General purpose graphics libraries such as ours ended up spending most of the time dealing with the cool features than the features saved. For example, if a plotter had a 2D perspective transform built in, was it better to do the 3D projection ourselves and just feed it untransformed vectors, or map the 3D in such a way as to allow the 2D processing of the plotter to help out? This might require pre-computing sample data.

Also, since the plotter had 2D transforms we have to do a lot more work including reading the plotter's status and inverting the plotter's transform matrix to make sure that the resulting output didn't end up outside the plotter's viewport.

A code analysis found that over 90% of the code and 90% of the processing time was spent preventing and dealing with input errors and handling compatibility issues.

Nowadays, it's harder in many ways with a wide variety of hardware based texturing and other rendering - do we do the lighting model ourselves, or let the HW do it? It may depend on whether we're going for speed and 'looks' or photometric correctness.

Share
twitter facebook
Show of Hands (Score:2)

by sharkey ( 16670 ) writes:

Anybody else read that as "XBender"?
I actually downloaded and ran his benchmark (Score:3, Interesting)

by LightStruk ( 228264 ) writes: on Saturday August 16, 2003 @01:40AM (#6710645)

and I noticed something strange. For those of you who can't or won't try Rasterman's benchmark yourself, the program runs six different tests, each of which uses a different scaling technique. Each of the six tests is run on the three different test platforms: XRender onscreen, XRender offscreen, and Imlib2. Imlib2 is also written by Rasterman, and is part of Enlightenment.

Here are the test scores from one of the rounds -

*** ROUND 3 ***

Test: Test Xrender doing 2* smooth scaled Over blends
Time: 196.868 sec.

Test: Test Xrender (offscreen) doing 2* smooth scaled Over blends
Time: 196.347 sec.

Test: Test Imlib2 doing 2* smooth scaled Over blends
Time: 6.434 sec.

Now for the strange thing. For the first platform, I watched as the program drew the enlightenment logo thousands of times in the test window, as you would expect. For the second test, it took about the same amount of time, but drew offscreen, again, as the test's name would indicate. However, for the imlib2 test, it also didn't draw anything in the test window.
I got the impression (perhaps wrongly?) that Imlib2 would actually draw to the screen as well. Since it doesn't change the screen, I have no way of telling if imlib2 is doing any drawing at all.

So, I'm digging into the benchmark's code... I'll let you guys know what I find.

Share
twitter facebook
- - Re:I actually downloaded and ran his benchmark (Score:2, Informative)
    
    by saikatguha266 ( 688325 ) writes:
    
    Whops. Mod me down on that last one. The image I described was the opaque image that is being used as a background, and the bufferring threw off the printf vs. x sync I am guessing. On closer examination ... imlib does seem to work ... but doesn't display anything while its doing stuff ... only the final image.
The results are not obviously broken (Score:5, Insightful)

by asnare ( 530666 ) writes: on Saturday August 16, 2003 @01:47AM (#6710666)
A lot of people are questioning the results claimed by Rasterman; however try downloading the thing and running it for yourself. I see the same trend that Rasterman claims when I do it.

My system: Athlon 800, nVidia 2-GTS.
Drivers: nVidia driver, 1.0.4363 (Gentoo)
Kernel: 2.4.20-r6 (Gentoo)
X11: XFree86 4.3.0

I've checked and:
1. agpgart is being used;
2. XF86 option "RenderAccel" is on.
The benchmark consists of rendering an alphablended bitmap to the screen repeatedly using Render extension (on- and off-screen) and imlib2. Various scaling modes are also tried.

When there's no scaling involved, the hardware Render extension wins; it's over twice as fast. That's only the first round of tests though. The rest of the rounds all involve scaling (half- and double-size, various antialiasing modes). For these, imlib2 walks all over the Render extension; we're talking three and a half minutes versus 6 seconds in one of the rounds; the rest are similar.

I'm not posting the exact figures since the benchmark isn't scientific and worrying about exact numbers isn't the point; the trend is undeniable. Things like agpgart versus nVidia's internal AGP driver should not account for the wide gap.

Given that at least one of the rounds in the benchmark shows the Render extension winning, I'm going to take a stab at explaining the results by suggesting that the hardware is probably performing the scaling operations each and every time, while imlib2 caches the results (or something). The results seem to suggest that scaling the thing once and then reverting to non-scaling blitting would improve at least some of the rounds; this is too easy, however, since while it helps the application that knows it's going to repeatedly blit the same scaled bitmap, not all applications know this a priori.

- Andrew
Share
twitter facebook
- Re:The results are not obviously broken (Score:3, Informative)
  
  by Yokaze ( 70883 ) writes:
  
  > I'm going to take a stab at explaining the results by suggesting that the hardware is probably performing the scaling operations each and every time, while imlib2 caches the results (or something).
  
  Well, you have the means at hand to confirm it.
  
  A quick glance reveals, no, the result is not cached in the sense you probably assume.
  The Imlib2 scales and fitlers the image in each of the REPS iterations.
Render Bench (Score:5, Informative)

by AstroDrabb ( 534369 ) writes: on Saturday August 16, 2003 @01:50AM (#6710671)

I just ran the render bench from the link. The results are pretty amazing.

Available XRENDER filters: nearest bilinear fast good best Set up... --ROUND 1 -- Test: Test Xrender doing non-scaled Over blends Time: 22.842 sec. -- Test: Test Imlib2 doing non-scaled Over blends Time: 0.501 sec. --ROUND 2 -- Test: Test Xrender doing 1/2 scaled Over blends Time: 11.438 sec. -- Test: Test Imlib2 doing 1/2 scaled Over blends Time: 0.188 sec. --ROUND 3 -- Test: Test Xrender doing 2* smooth scaled Over blends Time: 225.476 sec. -- Test: Test Imlib2 doing 2* smooth scaled Over blends Time: 3.963 sec.

Share
twitter facebook
I've experienced this myself. (Score:4, Insightful)

by Anonymous Coward writes: on Saturday August 16, 2003 @02:22AM (#6710743)

The problem is in *sending* the graphics commands to the hardware. If you're manually sending quads one at a time, I found that for 16x16 squares on screen, it's faster to do it in software than on a GEForce 2 (that was what I had at the time - this was a few years back). Think about it:

== Hardware ==

Vertex coordinates, texture coordinates and primative types are DMA'd to the video card. The video card finds the texture and loads all the information into it's registers. It the executes triangle setup, then the triangle fill operation - twice (because it's drawing a quad).

== Software ==

Source texture is copied by the CPU to hardware memory, line by line.

Actual peak fill rate in software will be lower than hardware - but if your code is structured correctly (textures in the right format, etc) - there's no setup. The hardware latency looses out to the speed of your CPU's cache - the software copy has the same complexity as making the calls to the graphics card. :)

The trick is to *batch* your commands. Sending several hundred primatives to the hardware at the same time will blow software away - especially as the area to be filled increases. Well.. most of the time, but it really depends on what you're doing.

Share
twitter facebook
the usual superficial analyses of X11 (Score:4, Interesting)

by penguin7of9 ( 697383 ) writes: on Saturday August 16, 2003 @02:32AM (#6710807)

XRender is a new extension with only a reference implementation in XFree86. The point is to experiment with an API prior to freezing it. I know this may come as news to people who have grown up on Microsoft software, but real software developers first try out various ideas and then later start hacking it for speed. It would be quite surprising, actually, if it were faster than a hand-tuned client-side software implementation.

It will be a while until XRender beats client-side software implementations. Furthermore, you can't just take a client-side renderer and hack in XRender calls and expect it to run fast--code that works efficiently with a client-server window system like X11 needs to be written differently than something that moves around pixels locally.

Share
twitter facebook
Nothing is wrong with XRender? (Score:2, Informative)

by Anonymous Coward writes:

I ran the benchmark on a VIA MVP3 motherboard with AMD K6-3+ 400 MHz CPU and GeForce2 MX 400 vide card. With RenderAccel option enabled, the unscaled test runs two times faster with XRender, but when that option is set to "false" in XF86Config, the results are as follows:

Test: Test Xrender doing non-scaled Over blends
Time: 16.234 sec.

Test: Test Xrender (offscreen) doing non-scaled Over blends
Time: 16.108 sec.

Test: Test Imlib2 doing non-scaled Over blends
Time: 1.932 sec.

That was with hardware acceleratio
Works nice and fast for me (Score:5, Funny)

by Trogre ( 513942 ) writes: on Saturday August 16, 2003 @02:59AM (#6710898) Homepage

After installing imlib2, and running render_bench's 'make', it gives me the following:

cc -g -I/usr/X11R6/include `imlib2-config --cflags` -c main.c -o main.o
main.c: In function `xrender_surf_new':
main.c:67: `PictStandardARGB32' undeclared (first use in this function)
main.c:67: (Each undeclared identifier is reported only once
main.c:67: for each function it appears in.)
main.c:67: warning: assignment makes pointer from integer without a cast
main.c:69: `PictStandardRGB24' undeclared (first use in this function)
main.c:69: warning: assignment makes pointer from integer without a cast
main.c: In function `xrender_surf_blend':
main.c:153: `XFilters' undeclared (first use in this function)
main.c:153: `flt' undeclared (first use in this function)
main.c:154: `XTransform' undeclared (first use in this function)
main.c:154: parse error before `xf'
main.c:156: `xf' undeclared (first use in this function)
main.c: In function `main_loop':
main.c:439: `XFilters' undeclared (first use in this function)
main.c:439: `flt' undeclared (first use in this function)
make: *** [main.o] Error 1

It seems to do this at the same speed, whether or not I have render acceleration enabled.

Share
twitter facebook
Couldn't even get XRENDER to work. (Score:2)

by Simon Kongshoj ( 581494 ) writes:

Decided to give it a try for my Matrox G400, but unfortunately, as soon as I ran his program, it died with a memory fault. It was apparently checking which XRENDER filters were available, then promptly died. All output I got was:
Available XRENDER filters: Memory fault
This was on Debian sid. Anyone else get something similar?
Caching (Score:4, Insightful)

by BenjyD ( 316700 ) writes: on Saturday August 16, 2003 @10:01AM (#6712043)

Somebody mentioned below that imlib is probably caching the image, whereas Xrender is doing the transformation everytime. So I thought I'd try the same caching approach with Xrender.

The first time the scale test is called, I rendered the image to an offscreen buffer with the correct transformations set. Then after that I just XRenderComposite to the screen from the offscreen buffer. The results (NVidia 4496, RenderAccel=true, geforce2 MX,athlon XP 1800+) for one test are:

*** ROUND 2 ***

Test: Test Xrender doing 1/2 scaled Over blends - caching implementation
Time: 0.126 sec.

Test: Test Xrender doing 1/2 scaled Over blends - original implementation
Time: 6.993 sec.

Test: Test Imlib2 doing 1/2 scaled Over blends
Time: 0.191 sec.

Which shows Xrender taking two-thirds the time of imlib.

My guess is that imlib is probably caching something. This is supported by the fact that Xrender is faster for the non-scaled composition in the original code.

Share
twitter facebook
- Re:Caching (Score:2)
  
  by BenjyD ( 316700 ) writes:
  
  OK, replying to my own post, but it's seemingly not caching that speeds up the imlib test- calling imlib_set_cache_size(0) has no effect on the speed of the imlib test.
simple solution (Score:3, Funny)

by yarbo ( 626329 ) writes: on Saturday August 16, 2003 @04:28PM (#6713708)

rename the benchmark 3dmark2003.exe

Share
twitter facebook
- - Re:Some Suggestions for Rasterman (Score:2)
    
    by Rooktoven ( 263454 ) writes:
    
    I actually have a mod point and I'm not going to spend it here. I'd give an underrated if it wasn't so long. I know that's the joke and parts _are_ funny, but more shock value than than witty commentary, IMO.
    
    Hey and at least this comment will make someone browse at -1.
- - Keith IS being paid. (Score:3, Insightful)
    
    by HanzoSan ( 251665 ) writes:
    
    Also the only reason its taking so long is because they wont fork, theres millions of developers who Redhat, Suse, Lindows etc would love to pay to develop Xrender, you think Keith Packard is the only developer in the world qualified to do this? No hes not, and neither is Carl Worth, but until there is a fork, everything goes through this core group of developers who decide everything.
    
    Its a management issue moreso than lack of developers or lack of money, believe me if Transgaming can get money, Xfree cou
    - Re:Keith IS being paid. (Score:2)
      
      by KentoNET ( 465732 ) writes:
      
      XWin is forking the Xlib (pretty much the heart of XFree86), though their own Xr and XCB (and a few other) projects. Check their site again, there are already CVS pservers up with code.
      - Re:Keith IS being paid. (Score:2, Interesting)
        
        by HanzoSan ( 251665 ) writes:
        
        Interesting, but how can we fund them? They dont accept donations, they dont have a way for someone like me who doesnt have the skills to develop Xrender to pay people who do.
        
        2 people on Xrender is why its taking so long.
      - Re:Keith IS being paid. (Score:3)
        
        by FooBarWidget ( 556006 ) writes:
        
        Those are X extensions, not forks.
- - Re:Putting the "wine" back in whining. (Score:3, Insightful)
    
    by MikeFM ( 12491 ) writes:
    
    I still believe in the answer that was always given for id games when asked when they'll release the product.. "When it's done." I think that applies to opensource even more.
    
    Besides we shouldn't be competing with MacOS or Windows. We don't need to clone those OS's or desktops. We need to create our own desktop that is unique. Make it work.. don't make it just to attract Windows and Mac users.
- Re:Yawn (Score:2, Interesting)
  
  by OrangeTide ( 124937 ) writes:
  
  client/server setup is a superior way of designing a windowing environment.
  
  X11 uses unix sockets (or optionally slower, less secure TCP) and shared memory.
  
  Win32 uses shared memory and messaging.
  
  MacOS X .. I don't know for certain, I hope it uses Mach kernel messages.
  
  QNX Photon uses qnx kernel messages and shared memory.
  
  The real difference is the layer at which the windowing system exists. in the case of X11, MacOS X and Photon. the windowing system is just another process.
  
  In Win32 it's a kernel thr
- Dude! tell us WHY!! (Score:2)
  
  by spineboy ( 22918 ) writes:
  
  If you're getting unusual AND nice results, we should know WHY! What kernal, drivers (Nvidia?) Linux version, Graphics card, processor, etc. Maybe we are all missing something obvious. tell us like in a scientific paper, the exact details, so we can VERIFY your results, and maybe YOU could help out the whole community.
- Re:It's wrong to use 3D functionality for 2D graph (Score:2, Informative)
  
  by Elm Tree ( 17570 ) writes:
  
  Unless I'm mistaken, XRender is utilizing the 2D acceleration features of a graphics card for scaling, alpha blending, anti-aliasing, etc. It's not trying to do 2D graphics over 3D. Although if you think linux shouldn't be doing that then you really shoulod look at Microsoft, they're moving to an entirely 3D desktop for longhorn.
- Re:It's RIGHT to use 3D functionality for 2D graph (Score:4, Informative)
  
  by rmlane ( 589573 ) writes: on Saturday August 16, 2003 @10:30AM (#6712144)
  
  On vaugley modern hardware the 3D path is so much faster than the 2D path that it ends up being significantly faster to use the 3D path to render your desktop if your desktop is at all complicated (not a dozen mono xterms).
  This ends up being even more true if you do any sort of complex compositing (eg: alpha blending, hardware accelerated mpeg / video, openGL windows, etc, etc). Enlightenment uses alpha channels, it would be fater to composite in hardware than software. These sorts of operations are not accelerated at all on the 2d path, and have to be done in software.
  Go check out Quartz Extreme at http://www.apple.com/macosx/jaguar/quartzextreme.h tml (excuse the space in html).
  Having used Xfree86 and Quartz extreme on the same graphics hardware, I can tell you there's no comparison. Quartz is much faster and much more capable.
  
  Parent Share
  twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

are the drivers installed? (Score:3, Funny)

Re:Can you learn to spell? (Score:2, Funny)

Re:are the drivers installed? (Score:2)

2D acceleration using OpenGL? (Score:5, Interesting)

One word: (Score:4, Informative)

Re:One word: (Score:3, Informative)

Re:One word: (Score:5, Insightful)

Re:One word: (Score:4, Interesting)

Re:One word: (Score:4, Informative)

Quartz Extreme in a few words (Score:5, Informative)

Re:One word: (Score:3, Informative)

Re:2D acceleration using OpenGL? (Score:5, Interesting)

Re:2D acceleration using OpenGL? (Score:5, Informative)

Re:2D acceleration using OpenGL? (Score:5, Informative)

Re:2D acceleration using OpenGL? (Score:3, Informative)

Re:2D acceleration using OpenGL? (Score:2, Interesting)

Re:2D acceleration using OpenGL? (Score:5, Informative)

Re: 2D acceleration using OpenGL? (Score:2)

Re:2D acceleration using OpenGL? (Score:2)

oh yeah, har de har har (Score:2)

Re:2D acceleration using OpenGL? (Score:2, Interesting)

Re:2D acceleration using OpenGL? (Score:2)

Re:2D acceleration using OpenGL? (Score:4, Informative)

Re:2D acceleration using OpenGL? (Score:2)

Re:2D acceleration using OpenGL? (Score:3, Insightful)

The damndest thing. (Score:5, Informative)

accelerated? (Score:4, Interesting)

Re:accelerated? (Score:3, Interesting)

Re:accelerated? (Score:2)

Re:accelerated? (Score:5, Informative)

Re:accelerated? (Score:2, Insightful)

Re:accelerated? (Score:2)

Re:accelerated? (Score:3, Insightful)

Re:accelerated? (Score:4, Informative)

duh (Score:3, Interesting)

Graphics cards and computation (Score:5, Interesting)

Re:Graphics cards and computation (Score:2)

Re: Graphics cards and computation (Score:5, Informative)

Re: Graphics cards and computation (Score:2)

Re: Graphics cards and computation (Score:3, Interesting)

Not enough details (Score:5, Informative)

Re:Not enough details (Score:4, Interesting)

Re:Not enough details (Score:4, Informative)

Re:Not enough details (Score:2)

I ran the benchmark with RenderAccel true (Score:5, Informative)

An important truth about X (Score:5, Funny)

Re:An important truth about X (Score:3, Informative)

3x for Banshee (Score:2)

Re:An important truth about X (Score:2)

is this the man who said that "Windows has won"? (Score:3, Insightful)

Thats a myth. (Score:2, Insightful)

Re:Thats a myth. (Score:2)

Re:Thats why theres Lindows with ClickNRun (Score:2)

Re:is this the man who said that "Windows has won" (Score:2)

Re:is this the man who said that "Windows has won" (Score:2)

Re:is this the man who said that "Windows has won" (Score:3, Informative)

WRONG! Crashing was why I switched to Linux (Score:2)

Re:is this the man who said that "Windows has won" (Score:2)

Comment removed (Score:4, Funny)

Re: (Score:2)

Raster's on holiday (Score:5, Informative)

Lessons from the ancient (Score:4, Interesting)

Re:Lessons from the ancient (Score:2, Interesting)

Re:Lessons from the ancient (Score:5, Insightful)

Re:Lessons from the ancient (Score:3, Interesting)

Re:Lessons from the ancient (Score:2, Funny)

Re:Lessons from the ancient (Score:2)

Unfair comparison (Score:3, Informative)

Re:Unfair comparison (Score:2)

nVidia Linux woes (Score:4, Informative)

Re:nVidia Linux woes (Score:2, Redundant)

Re:nVidia Linux woes (Score:2)

Re:nVidia Linux woes (Score:2)

Well, yes (Score:3, Interesting)

similar experience (Score:2)

It takes time to talk to hardware (Score:4, Interesting)

Show of Hands (Score:2)

I actually downloaded and ran his benchmark (Score:3, Interesting)

Re:I actually downloaded and ran his benchmark (Score:2, Informative)

The results are not obviously broken (Score:5, Insightful)