Vega: Performance Takes Time

Early on in the year, AMD’s Vega was launched, and it proved to be a bit of a disappointment. At no fault of their own, people believed it was capable of a slick 70 MH/s with little to no effort – unfortunately, people believe anything when it’s written on the internet.

Now, three months after launch, we indeed see that the Vega is capable of these kinds of speeds – but not without a little work. It’s shaping up to be a beast of a card, but HBM2 is unstudied, untested and unstable.

Remember that when VBIOS modding became public thanks to cryptocurrency mining, it was backed by almost five years of optimization work built by gamers, overclocking enthusiasts and the like. Furthermore, the JEDEC standard allowed us to further understand GDDR5 and how it could be optimized for specific algorithms.

HBM2 hasn’t benefited for any of these things. HBM2 stands as a unique memory stack without much documentation, information, or experimentation. It bothers me that the world at large is not happy with this card, because it -is- capable of so much more, but people – companies included – need to give it more time.

In the coming weeks – with permission from my overlords – I’ll be releasing an updated version of OhGodATool that supports Vega, along with a brief guide on the road to optimization so far.

One of the things I am quite happy about with Vega is the new ISA. There’s a whole host of little features that allow us to tune and optimize Ethereum and ZCash, and my favorite has been abusing v_add3_u32, which allows us to tune Decred, of all coins, to make solo mining profitable. I’m still working my way through the RRG, which you can find here, but it’s shaping up to be the card for rendering, mining and low-level neural networks.

What does need work, however, are the drivers. Currently Vega is crippled on Linux, and will continue to be crippled due to multiple things: the forced usage of the ROCm stack; the poor compiler (goodbye Catalyst, hello LLVM, my old friend…); the power consumption during VPGR useage; and subpar security. This is to be expected on release, and I really hope the community can chip in and help give the team behind ROCm a fresh set of eyes to help make this the end-all, be-all stack for future AMD GPUs.

The power throttling on the Vega is awful with Linux, and I’ve got little to say about it because I’ve been more focused on a hash rate increase than power tuning. That’ll come later, along with more of these update logs if they’re well received.

Until next time, folks – happy hashing.

 

 

 

Advertisements

4 thoughts on “Vega: Performance Takes Time

    1. Definitely. One of the promises I have made to the overlords is to try and create monthly reports on ROCm and try to improve it. I understand it has hardware limitations but the more I work with it the more I realize it’s the future for AMD. As such, it’s up to the programmers to put aside their egos and their personal preferences and look at how best to utilize the tools we are given.

      One of the biggest things I’d love to see is an official statement on OpenCL’s lifespan. If it is truly intended to become EoL one day, I think the community should be informed well in advance so they can start preparing their hardware for that change.

      On the communities end – especially mine – we need to take the time to make a low-level, simple post explaining why there are hardware requirements, why it is necessary, and why it is actually beneficial. I’m a firm believer in education being able to fix most of our problems – people just need things explained in a way they understand.

      Like

  1. – DKMS is coming to ROCm enabled stacks, it has its strengths and weaknesses but you get standard kernel – so security patch for Linux will not be an issue. We are also holding to our promise and upstreaming all our changes because really this is the best direction for the industry.

    The LC compiler part of ROCm is a new compiler but we are already seeing are where it has strong benefits over the historical compiler. In recent code we have been working with it had a 2x reduction in register utilization for RTM/FFT class problem. For 17.40 they will be releasing the same compiler we have on the Windows stack for Vega10 our historical LLVM to HSAIL to a second compiler called Shader Compiler ( SC) same as we use for DX. What you lose is source code to the full compiler, native inline GCN ASM and Native Assembler support. We also have done a number of optimization we need for Deep Learning algorithms.

    We have a number of updates coming for LC based on inputs from a number of Open Source projects who have contacted us. What we need is input on where the gaps are and which application we should test and optimize.

    One big difference with ROCm over the catalyst driver packaging is you can update compiler and language runtime independent of the driver releases. I still remember waiting for 15.302 to release just to get compiler update for ISV it was many months late when it only took a day to do the compiler fix. Catalyst also had its issues.

    What we need is every one help from the community logging issue, helping with documentation, optimization tricks, even help on source code itself. We have given the community full access to source code to our entire computing stack this is a big step. We are also addressing long term issue around documentation with for the first time ever online indexed and searchable, that can also be turned into PDF and epub. We are early on this but we know this is critical here is the link http://rocm-documentation.readthedocs.io/en/latest/index.html. This is up on GitHub developed with rst markdown via Sphinx.

    Like

  2. I am glad to see the ROCM team making public notice of changes to come, its a welcome change from the normal “shut off from the world” development process of catalyst in the past.

    The road ahead is long though, and while a lot of great things can happen, one of my main concerns is the bloat that will come from having to have the toolchain on every machine, even though the OpenCL asm bins for a given architecture are the same.

    Like

Spam me with criticism!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s