Last week NVIDIA released their first set of end-user OpenCL drivers. Previously OpenCL drivers had only been available for developers on the NVIDIA side of things, and this continues to be the case on the AMD side of things. With NVIDIA’s driver release, the launch of AMD’s 5800 series, and some recent developments with OpenCL, this is a good time to recap the current state of OpenCL, and what has changed since our OpenCL introductory article from last year.

A CPU & GPU Framework

Although we commonly talk about OpenCL alongside GPUs, it’s technically a hardware agnostic parallel programming framework. Any device implementing OpenCL should be cable of running any OpenCL kernel, so long as the developers take in to account querying the host device ahead of time as to not spawn too many threads at once. And while GPUs (being the parallel beasts that they are) are the primary focus, OpenCL is also intended for use on CPUs and more exotic processors such as the Cell BE and DSPs.

 

What this means is that when it comes to discussing the use of OpenCL on computers, we have two things to focus on. Not only is there the use of OpenCL on the GPU, but there’s the use of OpenCL on CPUs. If Khronos has their way, then OpenCL will be a commonly used framework for CPUs both to take better advantage of multi-core CPUs (8 threaded i7 anyone?) and as a fallback mechanism for when OpenCL isn’t available on a GPU.

This also makes things tricky when it comes to who is responsible for what. AMD for example, in making both GPUs and CPUs, is writing drivers for both. They are currently sampling their CPU driver as part of their latest Stream SDK (even if it is a GPU programming SDK), and their entire CPU+GPU driver set has been submitted to the Khronos group for certification.

NVIDIA on the other hand is not a CPU manufacturer (Tegra aside), so they are only responsible for having a GPU OpenCL driver, which is what they have been giving to developers for months. They have submitted it to Khronos and it has been certified, and as we mentioned they have released it to the public as of last week. NVIDIA is not responsible for a CPU driver, and as such they are reliant on AMD and Intel for OpenCL CPU drivers. AMD likes to pick at NVIDIA for this, but ultimately it’s not going to matter once everyone finally gets up to speed.

Intel thus far is the laggard; they do not have an OpenCL implementation in any kind of public testing, for either CPUs or GPUs. For AMD GPU users this won’t be an issue, since AMD’s CPU driver will work on Intel CPUs as well. For NVIDIA GPU users with Intel CPUs, they'll be waiting on Intel for a CPU driver. Do note however that a CPU driver isn't required to use OpenCL on a GPU, and indeed we expect the first significant OpenCL applications to be intended to run solely on GPUs anyhow. So it's not a bad situation for NVIDIA, it's just one that needs to be solved sooner than later.

OpenCL ICD: Coming Soon

Unfortunately matters are made particularly complex by the fact that on Windows and Linux, writing an OpenCL program right now requires linking against a vendor-specific OpenCL driver. The code itself is still cross-platform/cross-device, but in terms of compiling and linking OpenCL has not been fully abstracted. It’s not yet at the point where it’s possible to write and run a single Windows/Linux program that will work with any OpenCL device. It would be the equivalent of requiring an OpenGL game (e.g. Quake) to have a different binary for each GPU vendor’s drivers.

The solution to this problem is that OpenCL needs an Installable Client Driver (ICD), just like OpenGL does. With an ICD developers can link against that, and it will handle the duties of passing things off to vendor-specific drivers. However an ICD isn’t ready yet, and in fact we don’t know when it will be ready. NVIDIA - who chairs the OpenCL working group - tells us that the WG is “driving to get an ICD implementation released as quickly as possible”, but with no timetable attached to that. The effort right now appears to be on getting more OpenCL 1.0 implementations certified (NV is certified, AMD is in progress), with an ICD to follow.

Meanwhile Apple, in the traditional Apple manner, has simply done a runaround on the whole issue. When it comes to drivers they shipped Snow Leopard with their own OpenCL CPU driver, and they have GPU drivers for both AMD and NVIDIA cards. Their OpenCL framework doesn’t have an ICD per-say, but it has features that allow developers to query for devices and use any they like. It effectively accomplishes the same thing, but it’s only of use when writing programs against Apple’s framework. But to Apple’s credit, as of this moment they currently have the only complete OpenCL platform, offering CPU+GPU development and execution with a full degree of abstraction.

What GPUs Will Support OpenCL

One final matter is what GPUs will support OpenCL. While OpenCL is based around the hardware aspects of DirectX10-class hardware, being DX10 compliant isn’t enough. Even among NVIDIA and AMD, there will be some DX10 hardware that won’t support OpenCL.

NVIDIA: Anything that runs CUDA will run OpenCL. In practice, this means anything in the 8-series or later that has 256MB or more of VRAM. NVIDIA has a full list here.

AMD: AMD will only be supporting OpenCL on the 4000 series and later. Presumably there was some feature in the OpenCL 1.0 specification that AMD didn’t implement until the 4000 series, which NVIDIA had since the launch of the 8-series. Given that AMD is giving Brook+ the heave-ho in favor of OpenCL, this will mean that there’s going to continue to be a limited selection of GPGPU applications that work on these cards as compared to the 4000 series and later.

End-User Drivers

Finally to wrap this up, we have the catalyst of this story: drivers. As we previously mentioned, NVIDIA released their OpenCL-enabled 190.89 drivers to the public last week, which we’re happy to see even if the applications themselves aren’t quite ready. This driver release was a special release outside of NVIDIA’s mainline driver releases however, and as such they’re already out of date. NVIDIA released their 191.07 WHQL-certified driver set yesterday, and these drivers don’t include OpenCL support. So while NVIDIA is shipping an OpenCL driver for both developers and end-users, it’s going to be a bit longer until it shows up in a regular release.

AMD meanwhile is still in a developer-only beta, which makes sense given that they’re still waiting on certification. The estimates we’ve heard is that the process takes a month, so with AMD having submitted their drivers early last month, they should be certified soon if everything went well.

Comments Locked

67 Comments

View All Comments

  • bobvodka - Thursday, October 8, 2009 - link

    I never said CS4.x was 'useless' I just said that due to the lack of Interlock functionality its usefullness was reduced.

    Will it have some uses? Of course, and I dare say it will allow some cool things to be done, however when compared to CS5.0 profiles with, in perticular, the Interlock stuff then some things become harder to do or indeed impossible in a single pass. (see previous example).

    I agree that its unfortunate that the CS4.0 profile doesn't support Interlock, but I guess they had to draw the line somewhere.
  • Scali - Thursday, October 8, 2009 - link

    Yea well, I just think you have an odd perspective.
    Obviously CS5.0 is better, and obviously DICE wanted to promote the new DirectX 11 features.
    Bottom line is however that we've not had CS at ALL yet, in DirectX, even though there's a huge installed base of DX10 cards capable of CS4.0 or CS4.1. There isn't a lot of DX11 hardware out there yet.
    Therefore in the short term CS4.x is going to be the more interesting one, as it allows you to implement new functionality like realtime tessellation, physics etc, or to make more efficient implementations of existing technologies like post-processing/SSAO and all that. CS4.x is just a nice shot in the arm for all that DX10 hardware out there.

    On another note, you also have to put CS5.0 into perspective. Interlock makes it easy to code certain things a certain way, but it's no guarantee that it will also be efficient. That depends largely on how the hardware implements interlocking. Think back to the conditional branching that was introduced in PS3.0 for example. In many cases it was actually faster to just use a multipass algorithm using alphatest, which only required PS2.0, simply because the branching itself wasn't implemented in a very efficient way, unlike alphatesting.

    So while the DICE solution looks nice and efficient, it doesn't necessarily have to be all that much faster than a more bruteforce multipass algorithm. In fact, if I had to choose, I'd rather implement a CS4.0 algorithm that improves performance on all DX10 (and DX11) hardware, than to go for a CS5.0 algorithm that doesn't work on DX10 hardware, and may only be marginally faster than a CS4.0 algorithm on DX11 hardware (which is already the fastest hardware on the market anyway, so it's not the hardware that needs the performance increase most anyway).

    If you want a nice case, look at PhysX. It even runs on the G80 architecture, which doesn't support interlocking. So there's great things you can do with compute shaders without interlocking. It would be very nice if developers would use CS4.0 for such physics effects.
  • haplo602 - Tuesday, October 6, 2009 - link

    currently Nvidia is the only vendor with good linux support for opengl/opencl. AMD is behind in opengl and very much non-existent for opencl.

  • stmok - Tuesday, October 6, 2009 - link

    The entire Linux graphics stack is being overhauled.

    The current one has issues to address, and its the very reason why there's nothing like DXVA on Linux. Nvidia worked around this deficiency by coding their own approach...VDPAU. But this only works with Nvidia GF8 or newer cards and closed drivers.

    Closed drivers mean Nvidia must keep up with Kernel and Xorg versions. If they don't, you are at their whim and have to wait. If they choose to drop support for a specific era of hardware, you are SOL.

    Right now, Linux is gradually moving to the new stack. This is going to take time and cause pain to some users. (As infrastructure changes often do)...But the benefit is that things like HD playback accelerated and OpenCL will be supported in the long run. Features won't be hardware brand specific like VDPAU is...So far, there's very raw code for OpenCL support in the new stack. (Someone has started something, but its not really usable...More like an initial thing to see if its feasible.)

    As for AMD, the greatest thing they've done was release the documentation specs for open driver development. Right now; 2D and X-Video is done for all Radeons up to 4xxx series; with power saving (PowerPlay) and 3D features being worked on...They "kind of work", but are really buggy.

    Overall, if you need 3D acceleration now, you have no choice but to use Nvidia cards and closed drivers.

    In the long run (5 yrs+ away?), Radeon's may have the advantage of having open driver support. ie: Radeons will work out-of-the-box without too much fuss. One less step in setting up a Linux box.
  • jackylman - Friday, October 9, 2009 - link

    I'm using ATI's open-source 3D right now on a RadeonHD 4000 card. I don't find it buggy (runs compiz, googleearth, some games).

    Depending on your level of 3D need, closed-source drivers are no longer the only choice.
  • Deanjo - Wednesday, October 7, 2009 - link

    Wow, talk about a epic failure of understanding what vdpau is.

    vdpau, is NOT vender specific. It is a freely open standard that other graphic venders are free to implement as well in their drivers support either through native driver solution or through use of a wrapper. vdpau is a API. vdpau is not isolated to nvidia cards as well. The S3's chrome 5 series for example has native vdpau support.

    http://drivers.s3graphics.com/en/download/drivers/...">http://drivers.s3graphics.com/en/download/drivers/...

    In fact it is now part of freedesktop.

    http://lists.freedesktop.org/archives/xorg-announc...">http://lists.freedesktop.org/archives/xorg-announc...

    Saying vdpau is "vendor specific" is nothing but pure BS. You might as well say openCL is "vendor specific" as well by your definition of "vendor specific" since they are the only ones right now with openCL support in their drivers.
  • stmok - Wednesday, October 7, 2009 - link

    Well, if I'm wrong I apologize. But there's no need to act like a dick about it.

    How hard is it to politely correct someone Deanjo? Apparently in your case, too hard.

Log in

Don't have an account? Sign up now