ROCm is a mistake. It's fundamentally broken by compiling to hardware specific code instead of CUDA's RTX, so it will always be plagued with this issue of not supporting all cards, and even if a certain GPU is supported today they can stop supporting it next version. It has happened, it will continue happen.
It's also a strange value proposition. If I'm a programmer in some super computer facility and my boss has bought a new CDNA based computer, fine, I'll write AMD specific code for it. Otherwise why should I? If I want to write proprietary GPU code I'll probably use the de facto industry standard from the industry giant and pick CUDA.
AMD could be collaborating with Intel and a myriad of other companies and organizations and focus on a good open cross platform GPU programming platform. I don't want to have to think about who makes my GPU! I recently switched from an Intel CPU to an AMD, obviously to problem. If I had to get new software written for AMD processors I would have just bought a new Intel, even though AMD are leading in performance at the moment. Even Windows on ARM seems to work ok, because most things aren't written in x86 assembly anymore.
Get behind SYCL, stop with the platform specific compilation nonsense, and start supporting consumer GPUs on Windows. If you provide a good base the rest of the software community will build on top. This should have been done ten years ago.
Honestly, the problem isn't just which devices, but even more so, this (from the page, not your comment):
> No guarantees of future support but we will try hard to add support.
During the Great GPU Shortage, I bought an AMD RX5xx card for ML work. It was explicitly advertised to work with ROCm. Within a couple of months, AMD dropped ROCm support. EOLing an actively-sold product from being used for an advertised purpose within the warranty period was, if I understand consumer protection laws in my state correctly, fraud. There was no support from either the card vendor (MSI). No support from AMD. No support from the reseller. Short of small claims, which was not worth it, there was no recourse.
This is on a long list of issues AMD needs to sort out to be a credible player in this space:
* Those are the kinds of experiences which cause people to drop a vendor and not look back. AMD needs to either support cards forever, or at the very least, have an advertised expiration date (like Chromebooks and Android phones).
* Broad support is helpful from a consumer perspective from the simply pragmatic point of view that only a tiny fraction of the population has the time to read online forums, footnotes, or fine print. People should be able to buy a card on Amazon, at Best Buy, and Microcenter, and expect things to Just Work.
* Being able to plan is essential for enterprise use. I can't build a system around AMD if AMD might stop supporting their platform on 0 days notice, and the next day, there might be a security exploit which requires a version bump.
I'm hoping Intel gets their act together here, since NVidia needs a credible competitor. I've given up on AMD.
PTX does provide a low level machine abstraction. However you still target some version of hardware ( https://arnon.dk/matching-sm-architectures-arch-and-gencode-... ). However a lot of software effort has gone into it to make it look and work seamlessly.
Though AMD doesn't have the same "virtual ISA" as PTX right now there are increasing levels of such abstraction available in compiled flows with MLIR / Linalg etc. Those are higher level and can be compiled / jitted in realtime to obviate the need for a low level virtual ISA.
We already fought and lost this battle with 3D APIs for GPUs. What makes you think that winning strategy would play out any other way for tensor processing?
For context, the submitter of the issue is Anush Elangovan from AMD who's recently been a lot more active on social after the SemiAnalysis article, and taking the reigns / responsibility of moving AMD's software efforts forward.
However you want to dissect this specific issue, I'd generally consider this a positive step and nice to see it hit the front page.
hey thats me. Happy to help answer anything here and look forward to your constructive feedback to make AMD software better. We got work to do and look forward to it.
Ok, why does running koboldcpp with a "BLAS Batch Size" of 512 via Vulkan on an RX570 crash my entire computer? You know, to the point where I manually have to turn it on again.
I personally couldn't think of a better reason to never buy AMD GPUs ever again by the way.
I have experience running 130,000 RX470/570/480/580... if you're doing heavy workloads, those things full machine crash if you breathe on them wrong. That said, when they do run, they run extremely well.
There is 1000 reasons why your one GPU could have crashed, what does it say in the logs before it crashed?
I think AMDs offer was fair (full remote access to several test machines), then again just giving tinycorp the boxes on their terms with no strings attached as a kind of research grant would have earned them some goodwill with that corner of the community.
Either way both parties will continue making controversial decisions.
> Now, why don't they send me the two boxes? I understand when I was asking for firmware to be open sourced that that actually might be difficult for them, but the boxes are on eBay with a simple $$ cost. It was never about the boxes themselves, it was a test to see if software had any budget or power. And they failed super hard
I know this is someone else's reasoning, so you can't answer this question, but, doesn't this just test if they want to spend the budget on this specific thing?
If I ask a company for a $100,000 grant, and they're not willing, it doesn't seem like correct logic to assume that means they don't have the budget for it. Maybe they just don't want to spend $100,000 on me.
Why does this mean they don't have a budget or power?
He assumes the software department wants to do this, which - yes - seems to be flawed logic on his side.
Let's imagine he's indeed correct. He receives the hardware, get's hacking and solves all of AMDs problem, the stock surges and tinygrad becomes a major deep learning framework.
That would be a collosal embarrassment for AMDs software department.
Chip vendors regularly send out free hardware to software developers. In this case I don't think the cost is the issue; AMD simply doesn't want what Geohot is offering.
Considering that AMD is only really supporting their datacenter GPUs with ROCm, this is the worst possible response. It means compute on AMD GPUs is only meant for the elite of the elite and forever out of reach for the average consumer and that Nvidia is not only outcompeting AMD on quality but also on cost.
> He refused. It had to come from AMD. That's absurd and extortionist.
I'm on the wrong side of the Twitter wall to read the source, but that doesn't sound absurd. Extortionist, maybe. Hotz's major complaint (last time I checked, anyway) is pretty close to one I have - AMD appears to have between little and no strategic interest in consumer grade graphics cards having strong GPGPU support leading to random crashes from the kernel drivers and a certain attitude of "meh, whatever" from AMD corporate when dealing with that.
I doubt any specific boxes or testing regime are his complaint, he'd be much more worried about whether AMD management have any interest in companies like his succeeding. Third parties providing some support doesn't sound like it'd cut it. The process of being burned by AMD leaves one a little leery of any alleged support without some serious guarantees that more major changes are afoot in their management view.
> ...he'd be much more worried about whether AMD management have any interest in companies like his succeeding.
This reads as incredibly entitled. AMD owes him nothing, especially if he's opposed to the leadership's vision[1] and being belligerent about it.
There is maybe 1 or 2 companies with enough cachet to demand management changes at a supplier like AMD - and they have market caps in the trillions.
1. Lisa Su hasn't been shy about AMD being all about partnering with large partners who can move volume. My interpretation of this is AMD prefers dealing with Sony, Microsoft, hyperscalers, and HPC builders, then possibly tier II OEMs. Small startups are probably much further down the line, close to consumers at the tail end of AMD's attention queue. I don't like it as a consumer, but it seems like a sound strategy since the partners will shoulder most of the software effort, which is a weakness AMD has against Nvidia. They can focus on cranking out ok-to-great hardware at more-than-ok prices and build up a warchest for future investments, and who knows when this hype bubble will burst and take VC dollars with it, or someone invents an architecture that's less demanding on compute (if you're more optimistic)
Sure. But we hear a lot about Hotz because all the unentitled people rolled their eyes and went over to buy Nvidia cards. He's one of the major voices who are unreasonable enough to pipe up on Twitter and air dirty laundry.
I doubt AMD are going to listen to him. They're in a great spot and are probably going to tap into the market in a big way. But Hotz isn't crazy to test them in an odd way - although he'd probably be better off dropping AMD cards like most other people in his price range would.
> But Hotz isn't crazy to test them in an odd way..
He should have just read the Lisa Su interview from Q1 2024 where ahe laid out AMDs strategy without equivocating
> ... although he'd probably be better off dropping AMD cards
I think this is what's best for everyone. Looking at his recent track record[1], he seems like a person who's gets really excited by kicking things off and experiencing the exponentially growth phase, and then when it flattens out into a sigmoid curve, he dusts his hands and declares his work done, and moves to the next thing.
. 1. Hired by Elon to "fix" Twitter, CommaAI, and soon, Tiny
One might argue he's had a pattern for even longer. While he did do some early hypervisor glitching, even his PS3 root key release was basically just applying fail0verflow's ECDSA exploit (fail0verflow didn't release the keys specifically because they didn't want to get sued ... so that was a pretty dick move [1]).
For his projects, I think it's important to look at what he's done that's cool (eg, reversing 7900XTX [2], creating a user-space driver that completely bypasses AMD drivers for compute [3]) and separating it from his (super cringe) social media postings/self-hype.
Still, at the end of the day, here's hoping that someone at AMD realizes that having terrible consumer and workstation support will basically continue to be a huge albatross/handicap - it cuts them off basically all academic/research development (almost every single ML library and technique you can name/used in production is CUDA first because of this) and the non-hyperscaler enterprise market as well. Any dev can get a PO for a $500 Nvidia GPU (or has one on their workstation laptop already). What's the pathway for ROCm? (honestly, if I were in charge, my #1 priority would be to make sure ROCm is installed and works w/ every single APU installed, even the 2CU ones).
I don't really see why those companies would prefer AMD over Nvidia, they are not hurting for money and therefore able to spend that money on Nvidia or build their own hardware, like Google did.
Meta and Microsoft are big enough they could just build their own TPUs with a stable software stack and cut off Nvidia and AMD at the same time.
From this perspective, AMD only ever makes sense as an "also ran company" for a few niche use cases.
> This reads as incredibly entitled. AMD owes him nothing, especially if he's opposed to the leadership's vision[1] and being belligerent about it.
A generation ago, everyone in sales and developer relations understood that "the customer is always right". Remember a sweaty dude on stage jumping about screaming "developers! developers! developers"? It was exhausting dealing with all the free software and hardware sent to developers, not to mention the endless free conferences for even the most backwater developer community. But that's an ethos for boomers, I guess.
On the one hand "incredibly entitled" and on the other you talk about AMD's leadership vision. Your long closing paragraph shows that entitlement of a developer has nothing to do with anything and isn't relevant in the conversation (I can show you guys at OEMs who are incredibly arrogant and entitled or outright a$$holes but so what?). It's just an opinion based on your personal bias.
In reality, AMD simply doesn't care about small AI startups or developers as you've noted. They don't care about me wanting to run all my AI locally so that I can manage my dairy farm with a modest fleet of robots. If they cared, and they sent him MI300s immediately (or sent them to the other 8 startups that asked for them), you wouldn't be chastising him about being "incredibly entitled".
> AMD appears to have between little and no strategic interest in consumer grade graphics cards having strong GPGPU support leading to random crashes from the kernel drivers and a certain attitude of "meh, whatever" from AMD corporate when dealing with that.
AMD has little interest in software support in general.
Their Adrenalin software is riddled with bugs that have been here for years.
Having watched some of his streams on the topic, I think you've captured it well. He's basically saying he's done wasting time on AMD unless/until they get serious. It's not so much that he wants free hardware from them, rather he wants to see them put some skin in the game as they basically blew him off the last time he tried to engage with them.
We do care about software and acknowledge the gaps and will work hard to make it better. Please let me know any specific issues that are an issue for you and Im happy to push for it to get resolved or come back with why it isn't.
"I estimate having software on par with NVDA would raise their market cap by 100B. Then you estimate what the chance it that
@__tinygrad__
can close that gap, say it's 0.1%, probably a very low estimate when you see what we have done so far, but still...
That's worth 100M. And they won't even send us 2 ~100k boxes. In what world does that make sense, except in a world where decisions are made based on pride instead of ROI. Culture issue."
This is his opinion, nothing more, nothing less. He currently has a partially implemented piece of software that hasn't seen a release since November and isn't performant at all.
To be fair, having seen his software evolve, and having seen ROCm evolve, I'm more optimistic for his software in a year than yours.
He picked his problem better. The whole reason that tinygrad is, well, tiny, is that it limits the amount of overhead to onboard people and perform maintenance and rewrites. My strong impression is that the ROCm codebase is simply much too large for AMD's dev resources. You're trying to race NVidia on their turf with less resources. It's brave, but foolish.
I can see how Tinygrad could succeed. The story makes sense. AMD's doesn't, neither logically nor empirically. NVidia would have to seriously fumble.
That said I'm deeply worried about anyone whose based their company on amd gpus. The only reason why they do well in hpc is because there's an army of dreadfully underpaid and over performing grand students to pick up the slack from AMD. Trying to do that in a corporate environment is company suicide.
> That said I'm deeply worried about anyone whose based their company on amd gpus
Sony Interactive and Microsoft XBox seem to be doing great without an army of underpaid students. AMD does great at the top and bottom: the corporates in the middle that are unwilling or unable to pay people to author/tweak their software for AMD GPUs will do better going with Nvidia, which has great OOTB software, and a premium to go with it.
I suppose if AMD had infinite resources, it'd fix this post-haste.
This is false. 3D VCache is enabled by TSMC's 3DFabric packaging. It also didn't really play a role in AMD passing Intel. Chiplets are also enabled by TSMC technology, CoWoS.
When AMD passed Intel, they hadn't even decided to use TSMC at all yet. Of course now Intel is behind in leveraging TSMC technology. They started late.
AMD is so behind NVidia that it's not even funny. If AMD board had any sense, they'd be carpet-bombing every researcher, AI startup, and random Joes with the latest engineering samples of unreleased top-of-the line products. And giving them a direct line to the engineering team.
This would end up costing maybe tens of millions at most, but the potential return is indeed measured in billions.
And yep, lots of people like geohot are (to put it mildly) eccentric. So deal with it. They are not merely your customers, they are your freaking sales people.
As it is, I work in a startup that does a bit of AI vision-related stuff. I'm not going to even touch AMD because I don't want to deal with divas on the AMD board in future. NVidia is more expensive right now, but they're far more predictable.
> AMD is so behind NVidia that it's not even funny.
Do you really want all AI hardware and software dominated by a monopoly? We're not looking to "beat" Nvidia, we are looking to offer a compelling alternative. MI300x is compelling. MI355x is even more compelling.
If there is another company out there making a compelling product, send them my way!
People keep forgeting CUDA is not only about AI, graphics matter as well, as does being a polyglot ecosystem, the IDE integration, the graphical debugging tools, the libraries, having a memory model based on C++ memory model, and the last point is quite relevant, as NVidia employs a few key persons from C++ ecosystem that work on the ISO C++ standard (WG21).
Time will tell, no? Transmeta shipped a lot of Crusoes. It was run by brilliant people. It was a “compelling alternative.” Maybe Cerebras is the Transmeta of this race, I don’t know. But. It’s not about making an alternative. It most definitely is about “beating” NVIDIA. Otherwise, you are just shoveling dollars - shareholders’, undercompensated employees at AMD and TSMC, etc. - to Meta, like everyone else.
I'm willing to try AMD, and I even built an AMD-based machine to experiment with AI workflows. So far it has been failing miserably. I don't care that MI300X is compelling when I can't make samples work both on my desktop and on a cloud-based MI300X. I don't care about their academic collaborations, I'm not in the business of producing papers.
I'll just pay for H100 in the cloud to be sure that I will be able to run the resulting models on my 3090 locally and/or deploy to 4090 clusters.
If AMD shows some sense, commits to long-term support for their hardware with reasonable feature-parity across multiple generations, I'll reconsider them.
And AMD has a history of doing that! Their CPU division is _excellent_, they are renowned for having long-term support for motherboard socket types. I remember being able to buy a motherboard and then not worrying about upgrading the CPU for the next 3-4 years.
> I'm willing to try AMD, and I even built an AMD-based machine to experiment with AI workflows. So far it has been failing miserably. I don't care that MI300X is compelling when I can't make samples work both on my desktop and on a cloud-based MI300X.
Anush was actively looking for feedback on this on github today...
I have quad w7900s under my desk that work well for workloads on my desktop that translate well to MI300x. There are some perf gaps with FAv2, and FP8 but otherwise I get a seamless experience. lmk if you have a pointer to any github issues for me to track down to make your experience better.
You dont think AMD being competitive with Nvidia (3,37 trillion USD MC) would be "nearly 100B good"? Believe it or not the only reason thats not the case is good bug-free software. Thats what tinygrad is doing
AMD already has major ongoing projects with OpenXLA/IREE. Lots of established engineers/researchers, and it’s in collaboration with Google/AWS. Hotz is delusional if he thinks that he can do better by ripping off Karpathy’s toy autograd implementation.
> AMD already has major ongoing projects with OpenXLA/IREE.
And how's that been going? The AMD stock price compared to NVidia seems to speak volumes about the efficacy of these projects.
IREE has been around for 5 years, without producing anything overtly practical. They seem to be focused more on academic jobs and citations. It's also focused on the general case of a compiler for "all" AI-type tasks, supporting everything from WASM to CUDA.
OpenXLA seems to be a bit more practical, but I spent the last 2 hours trying to make it work on my AMD card (Radeon Pro W7900) and failing.
I personally don't like Tinygrad's approach of doing their own thing rather than integrating into PyTorch/JAX/..., but it at least is _practical_ with a reasonable end-goal. Is it going to be successful? Who knows. But it's more practical than anything AMD has done within the recent 5 years.
I am an ML scientist, my company and several others are using IREE to deploy our models to edge devices. It is the most promising technology in this area.
Those academic publications are a sign that the people involved actually know what they’re doing, and are making sure their work holds up to scrutiny.
Yeah, AMD is already pouring a lot of support into OpenXLA/IREE, which has a lot of well-respected compiler engineers and researchers working on it, and companies like AWS are also investing into it.
I don’t really think TinyCorp has anything to offer AMD.
Complex how? He requested payment in the form of MI300X servers, which is unconventional, sure, but the value of the payment is not out of line with the support he proposed to provide IMO.
Really telling they have to ask us for what cards we want as opposed to supporting all cards by default from day 1 like Nvidia.
All because they went with a boneheaded decision to require per-device code compilation (gfx1030, gfx1031...) instead of compiling to an intermediate representation like CUDA's PTX. Doubly boneheaded considering the graphics API they developed, Vulkan, literally does that via SPIR-V!
The author of the issue comments that they'll eventually support all cards. What he really is asking for, is what cards people want them to prioritize, not just support.
I read it fully. Whole point of my post is that, based on their track record so far plus the technical limitations, it is impossible for AMD to provide the same day 1 drop in compatibility that the CUDA ecosystem offers.
Edit:
> No guarantees of future support but we will try hard to add support.
yes. We are behind on software support for all consumer cards and would love to support all cards. But are looking for guidance / feedback so we can prioritize.
> No guarantees of future support but we will try hard to add support.
AMD reps told me exactly the same thing years ago about how they'd love to support all cards, when RDNA2 had just launched. Fast forward, only W6800 is properly supported from that gen. The last time I tried, it had tons of kernel bugs that caused hard freezes outside most basic cases.
You need to come out and say that you will support all cards, no ifs or buts, by a hard deadline.
I can understand wanting to prioritize support for the cards people want to use most, but they should still plan to write software support for all the cards that have hardware support.
I've always described Nvidia as an accelerated compute company that happens to sell hardware.
AMD are smart, and they solve big problems in ways that are baffling to many. They're very sensitive to moats and position themselves with products or frameworks to drain them.
I consider their primary product "engineering competence as a service", but when no one external picks up the reigns, they don't try very hard to play market maker. I remember when Intel's R&D budget was more than AMD's market cap– they're effective both at and when running lean.
The reality here is that people don't have grievances with CUDA and Nvidia aren't doing anything egregious with it. But whether that's due to ROCm's existence... we can only speculate.
> The reality here is that people don't have grievances with CUDA and Nvidia aren't doing anything egregious with it.
Correct. Lots of people also developed specifically for Internet Explorer too.
They are a monopoly and if that is important to you, then you'll want alternative solutions to avoid putting all your eggs in one basket.
People have short term memory loss and forget that just a few months ago, H100's were impossible to get and the price skyrocketed. Given the "insane demand" of Nvidia compute (and compute in general), these sorts of supply/demand issues will be indefinitely ongoing. How many times will people need to get burned until they start to seek alternatives? Hard to say...
Hardware first, but then their hardware isn't any better than NVidia's, so I don't see how that's a valid excuse here.
(Okay, maybe their super high end unobtanium-level GPUs are better hardware-wise. Don't know, don't care about enterprise-only hardware that is unbuyable by mere mortals.)
It's just not, people like to try and defend AMD out of hatred for Nvidia but the thousands of fumbles over the past 15 years that have led AMD to their current position and Nvidia to their current dominance are not deserving of coddling and excuses.
The fact support still isn't there, they've had 2 years since Stable Diffusion to get a serious team up and shipping and they still don't even have enough resources pointed at this to not have to be asking what should be prioritized.
The only way to fix their culture/priorities is to stop buying their cards.
People set up Stable Diffusion with automatic1111 and rocm for all kinds of weird setups successfully. What AMD needs to do is basically just provide a better out of the box experience, as even following the other people's instructions have been flaky at best. For example, for my 6600 XT, I have tried setting up SD twice. I succeeded in Manjaro in the past (like, a year ago), but didn't succeed now, and I succeeded in Debian now, but it uses the CPU for some reason. Hardware setup was the same, the only thing that changed is that I have updated my Linuxes in the meantime.
rocm is kind of a joke. Recently I wanted to write some golang code which talks to rocm devices using amd smi. You have to build and install the go amd smi from source, the go amd smi repo has dead links and there is basically no documentation anywhere on how to get this working.
Compare this to nvidia where I just imported the go nvml library and it built the cgo code and automatically links to nvidia-ml.so at runtime.
Second: these dependencies should all be packaged into deb/rpm
Third: there should be a goamdsmi package which has a proper dependency tree. I should be able to do ‘apt-get install goamdsmi’ and it should install everything I need. This is how it works with go-nvml.
Imagine nvidia supported only the 4090, 4080 and 4070 for cuda at the consumer level. With the 3090 not being supported since the 40xx series came out. This is what amd is defending here.
Super annoying. I have an RX 6600 XT and can't get ROCm to work on Linux.
Vulkan ML however worked perfectly out of the box, so at least I got something.
The caveat being that PyTorch has a lot of dependencies and a couple of them are not yet available in Debian Unstable. For folks wanting to use StableDiffusion, that's a problem. However, the available packages are more than sufficient for llama-cpp as you point out.
I honestly can't figure out which Radeon GPUs are supposed to be supported.
The GitHub discussion page in the title lists RX 6800 (and a bunch of RX 7xxx GPUs) as supported, and some lower-end RX 6xxx ones as supported for runtime. The same comment also links to a page on the AMD website for a "compatibility matrix" [1].
That page only shows RX 7900 variants as supported on the consumer Radeon tab. On the workstation side, Radeon Pro W6800 and some W7xxx cards are listed as supported. It also suggests to see the "Use ROCm on Radeon GPU documentation" page [2] if using ROCm on Radeon or Radeon Pro cards.
That link leads to a page for "compatibility matrices" -- again. If you click the link for Linux compatibility, you get a page on "Linux support matrices by ROCm version" [3].
That "by ROCm version" page literally only has a subsection for ROCm 6.2.3. It only lists RX 7900 and Pro W7xxx cards as supported. No mention of W6800.
(The page does have an unintuitively placed "Version List" link through which you can find docs for ROCm 5.7 [4]. Those older docs are no more useful than the 6.2.3 ones.)
Is RX 6800 supported? Or W6800? Even the amd.com pages seem to contradict each other on the latter.
Maybe the pages on the AMD site only list official production support or something. In any case it's confusing as hell.
Nothing against the GitHub page author who at least seems to try and be clear but the official documentation leaves a lot to be desired.
I will provide this feedback to the docs team to clean up. I found it hard when i was making that Poll :D but I looked harder instead of trying to fix the docs. So thank you for the feedback.
> I honestly can't figure out which Radeon GPUs are supposed to be supported.
Exactly.
I have a 6700 XT with 12 gig ram and a 5700 with 8 gig ram.
If i ctrl+f for either of those numbers on the GH issue, I get one hit. For the 6700, it's a single row that has a green check for "runtime" and a red x for "HIP SDK". For the 5700 card, it's somebody in the peanut gallery saying "don't forget about us!".
HIP is the c++ "flavor" that can compile down to work on amd _and_ nvidia gpus. If the 6700 has support for the "runtime" but not HIP ... what does that even mean for me?
And as you pointed out, the 6800 series card has green checks for both so that means it's fully supported? But ... it's not listed on AMD's site?!
Bad docs are how you cement a reputation of "just buy nvidia and install their latest drivers and it'll be fine".
Removing support for Radeon VII is a bonehead move that smacks of stupidity or greed. The cards were targeted for enthusiast gamers but have enterprise level hardware, like HBM2 memory and 1 TB/s bandwidth.
It's not nice to assume that people don't read then proceed to comment.
I read the link and I upvoted the "just support all GPUs you recently produced" comment.
I don't think the solution to bad software support is the prioritization. The prioritization is causing even more discrimination among different GPUs and different customers.
You can say whatever you want, and downvote whatever you want. However, that doesn't solve the real problem.
A lot of people think rocm is basically a big pile of crap.
What are the chances for amd to consider alternatives:
- adopt oneapi and try to fight Nvidia together with intel
- Vulkan and implement pytorch backend
- sycl
I figure that list is only what’s officially supported, meaning things not on that list may or may not work?. For example, my 6800 XT runs stable diffusion just fine on Linux with PyTorch ROCm.
I cannot compare the performance with other cards, but it takes a few seconds for SDXL images (e.g. 1024x512) as long as it doesn’t run OOM.
I use a fork of the stable diffusion webui [0] which, for me, handled memory better. Setup was relatively easy: install the pytorch packages from the ROCm repo and it worked.
I’m constantly baffled and amused on why AMD keeps majorly failing at this.
Either the management at AMD is not smart enough to understand that without the computing software side they will always be a distant number 2 to NVIDIA, or the management at AMD considers it hopeless to ever be able to create something as good as CUDA because they don’t have and can’t hire smart enough people to write the software.
Really, it’s just baffling why they continue on this path to irrelevance. Give it a few years and even Intel will get ahead of them on the GPU side.
If I were Jensen, I would snap up all the GPU software experts I possibly could, and put them to work improving the CUDA ecosystem. I'd also spin up a big research group to further fuel the CUDA pipeline for hardware, software, and application areas.
Which is exactly what NVIDIA seems to be doing.
AMD's ROCm software group seems far behind, is probably understaffed, and probably is paid a fraction of what NVIDIA pays its CUDA software groups.
AMD also has to catch up with NVlink and Spectrum-X (and/or InfiniBand.)
AMD's main leverage point is its CPUs, and its raw GPU hardware isn't bad, but there is a long way to go in terms of GPU software ecosystem and interconnect.
I've never understood why they have such a fractured approach to software:hardware support. I remember reading and writing comments about this on hn nearly a decade ago now. It's a long time to keep making the same mistake.
They had the exact same kind of support issues back in the OpenCL days, where they didn't manage to provide cross platform, cross card support for same versions of the platform.
I have never been able to reconcile it with their turnaround and newfound competence on the CPU side.
> I’m constantly baffled and amused on why AMD keeps majorly failing at this.
i wonder if you've considered the possibility that there's some component/dimension of this that you're simply unaware of? that it's not as straightforward as whatever reductive mental model you have? is that even like within the universe of possibilities?
My wishlist for ROCm support is actually supporting the cards they already released. But that's not going to happen.
By the time an (consumer) AMD device is supported by ROCm it'll only have a few years of ROCm support left before support is removed. Lifespan of support for AMD cards with ROCm is very short. You end up having to use Vulkan which is not optimized, of course, and a bit slower. I once bought an AMD GPU 2 years after release and 1 year after I bought it ROCm support was dropped.
FWIW, every ROCm library currently in the Debian 13 'main' and Ubuntu 24.04 'universe' repository has been built for and tested on every discrete consumer GPU architecture since Vega. Not every package is available that way, but the ones that are have been tested on and work on Vega 10, Vega 20, RDNA 1, 2 and 3.
Note that these are not the packages distributed by AMD. They are the packages in the OS repositories. Not all the ROCm packages are there, but most of them are. The biggest downside is that some of them are a little old and don't have all the latest performance optimizations for RDNA 3.
Those operating systems will be around for the next decade, so that should at least provide one option for users of older hardware.
Packages existing and the software actually working are very different things. You can run rocm on unsupported GPUs like a 780m, but as soon as you hit an issue you are out of luck. And you’ll hit an issue.
For example, my 780m gets 1-2 inferences from llama.cpp before dropping off the bus due to a segfault in the driver. It’s a bad enough lockup that linux can’t cleanly shutdown and will hang under hard rebooted.
The 780m is an integrated GPU. I specified discrete GPUs because that's what I have tested and can confirm will work.
I have dozens of different AMD GPUs and I personally host most of the Debian ROCm Team's continuous integration servers. Over the past year, I have worked together with other members of the Debian project to ensure that every potentially affected ROCm library is tested on every discrete consumer AMD GPU architecture since Vega whenever a new version of a package is uploaded to Debian.
FWIW, Framework Computers donated a few laptops to Debian last year, which I plan to use to enable the 780m too. I just haven't had the time yet. Fedora has some patches that add support for that architecture.
As the underdog AMD can't afford to have their efforts perceived as half-assed or a hobby or whatever. They should be moving heaven and earth to maximize their value proposition, promising and delivering on longer support horizons to demonstrate the long term value of their ecosystem.
Honestly at this point half-assed support would be a significant step up from their historical position. The one thing they have pioneered is new tiers of fractional assedness asymptotically approaching zero.
I mean at this point my next card is going to be an nvidia. It has been a total waste of time trying to use rocm for anything machine-learning based. No one uses it. No one can use it. The card I have is somehow always not quite supported.
I have a mi50 with 16gb of hbm thats collecting dust (its Vega bases, so it can play games, I guess) because I don’t want to bother setting up a system with Ubuntu 20.04, the last version of Ubuntu the last version of ROCM that supported the MI50 works on.
With situations like this, its not hard to see why Nvidia totally dominates in the compute/ai market.
The MI50 may be considered deprecated in newer releases, but it seems to work fine in my experience. I have a Radeon VII in my workstation (which shares the same architecture) and I host the MI60 test machine for Debian AI Team. I haven't had any trouble with them.
I wrote that patch. It's not actually used for MI50/MI60 in any of the Debian system packages, since Debian builds for gfx906 rather than using the gfx900 fallback path that patch provides. Debian is not relying on any special patches to enhance gfx906 support. That architecture is the same as upstream.
Now, for some other GPU architectures, you're absolutely right. There are indeed important patches in Debian that enable its extra-wide hardware compatibility.
I don’t think the mi60 has reached deprecated status yet (the last time I look at prices for the mi50 and mi60, the mi60 was something like 3x expensive, and I think thats because its still officially supported), but I’ll check this all out. Thanks.
The MI60 is basically just a faster MI50 with more memory. They were deprecated together. It's plausible there could be small firmware or driver differences that cause issues in one but not the other, but I think that's unlikely.
AMD did over $5 billion in GPU compute (Instinct line) last year. Not nVidia numbers but also not bad. Customers love that they can actually get Instinct system rather than trying to compete with the hyperscalers for limited supplies of nVidia systems. Meta and Microsoft are the two biggest buyers of AMD Instincts, though...
AMD Instinct is also more power efficient and has comparable (if not better) performance for the same (or less) price.
You can use ROCM on consumer radeon as long as you pay more than 400 dollars for one of their gpus. Meanwhile, you can run stable diffusion with the -lowvram flag on a 3050 6gb that goes for 180 dollars
As someone from the rendering side of GPU stuff, what exactly is the point of ROCm/CUDA? We already have Vulkan and SPIR-V with vendor extensions as a mostly-portable GPU API, what do these APIs do differently?
Furthermore, don't people use PyTorch (and other libraries? I'm not really clear on what ML tooling is like, it feels like there's hundreds of frameworks and I haven't seen any simplified list explaining the differences. I would love a TLDR for this) and not ROCm/CUDA directly anyways? So the main draw can't be ergonomics, at least.
Vulkan doesn't do C++ as shading language for example, there are some backend attempts to target SPIR-V, but it still is early days and nowhere close of having the IDE integration, graphical debugging tools and rendering libraries that CUDA enjoys.
users mainly use PyTorch and Jax and these days rarely write CUDA code.
however separately, installing drivers and the correct CUDA/CuDNN libraries is the responsibility of the user. this is sometimes slightly finicky.
with ROCm, the problem is that 1) PyTorch/Jax don't support it very well, for whatever reason which may be partly to do with the quality of ROCm frustrating PyTorch/Jax devs, 2) installing drivers and libraries is a nightmare. it's all poorly documented and constantly broken. 3) hardware support is very spotty and confusing.
it's an interesting question. the unhelpful answer is Vulkan didn't exist when Tensorflow, PyTorch (and Torch, its Lua-based predecessor) were taking off and building GPU support. Apparently PyTorch did at one point prototype a Vulkan backend but abandoned it.
My own experience is that half-assed knowledge of C/C++, and a basic idea of how GPUs are architected, is enough to write a decent custom CUDA kernel. It's not that hard to do. No idea how I would get started with Vulkan, but I assume it would require a lot more ceremony, and that writing compute shaders is less intuitive.
there is also definitely a "worse is better" effect in this area. there are some big projects that tried to be super general and cover all use cases and hardware. but a time-crunched PhD student or IC just needs something they can use now. (even Tensorflow, which was relatively popular compared to some other projects, fell victim to this.)
George Hotz seems like a weird guy in some respects, but he's 100% right that in ML it is hard enough to get anything working at all under perfect conditions, you don't need fighting with libraries and build tools on top of that, or the mental overhead of learning how to use this beautiful general API that supports 47 platforms you don't care about.
except also "worse is better is better" -- e.g. because they were willing to make breaking changes and sacrifice some generality, Jax was able to build something really cool and innovative.
Cuda the language is an antique dialect of C++ with a vectorisation hack. It's essentially what you get if you take an auto-vectoriser and turn off the correctness precondition, defining the correct semantics to be that which you get if you ignore dataflow. This was considered easier to program with than vector types and intrinsics.
Cuda the ecosystem is a massive pile of libraries for lots of different domains written to make it easier to use GPUs to do useful work. This is perhaps something of a judgement on how easy it is to write efficient programs using cuda.
ROCm contains a language called HIP which behaves pretty similarly to Cuda. OpenCL is the same sort of thing as well. It also contains a lot of library code, in this case because people using Cuda use those libraries and don't want to reimplement them. That's a bit of a challenge because nvidia spent 20 years writing these libraries and is still writing more, yet amd is expected to produce the same set in an order of magnitude less time.
If you want to use a GPU to do maths, you don't actually need any of this stuff. You need the GPU, something to feed it data (e.g. a linux host) and some assembly. Or LLVM IR / freestanding c++ if you prefer. This whole cuda / rocm thing really is intended to make them easier to program.
ROCm is a mistake. It's fundamentally broken by compiling to hardware specific code instead of CUDA's RTX, so it will always be plagued with this issue of not supporting all cards, and even if a certain GPU is supported today they can stop supporting it next version. It has happened, it will continue happen.
It's also a strange value proposition. If I'm a programmer in some super computer facility and my boss has bought a new CDNA based computer, fine, I'll write AMD specific code for it. Otherwise why should I? If I want to write proprietary GPU code I'll probably use the de facto industry standard from the industry giant and pick CUDA.
AMD could be collaborating with Intel and a myriad of other companies and organizations and focus on a good open cross platform GPU programming platform. I don't want to have to think about who makes my GPU! I recently switched from an Intel CPU to an AMD, obviously to problem. If I had to get new software written for AMD processors I would have just bought a new Intel, even though AMD are leading in performance at the moment. Even Windows on ARM seems to work ok, because most things aren't written in x86 assembly anymore.
Get behind SYCL, stop with the platform specific compilation nonsense, and start supporting consumer GPUs on Windows. If you provide a good base the rest of the software community will build on top. This should have been done ten years ago.
Agreed.
Honestly, the problem isn't just which devices, but even more so, this (from the page, not your comment):
> No guarantees of future support but we will try hard to add support.
During the Great GPU Shortage, I bought an AMD RX5xx card for ML work. It was explicitly advertised to work with ROCm. Within a couple of months, AMD dropped ROCm support. EOLing an actively-sold product from being used for an advertised purpose within the warranty period was, if I understand consumer protection laws in my state correctly, fraud. There was no support from either the card vendor (MSI). No support from AMD. No support from the reseller. Short of small claims, which was not worth it, there was no recourse.
This is on a long list of issues AMD needs to sort out to be a credible player in this space:
* Those are the kinds of experiences which cause people to drop a vendor and not look back. AMD needs to either support cards forever, or at the very least, have an advertised expiration date (like Chromebooks and Android phones).
* Broad support is helpful from a consumer perspective from the simply pragmatic point of view that only a tiny fraction of the population has the time to read online forums, footnotes, or fine print. People should be able to buy a card on Amazon, at Best Buy, and Microcenter, and expect things to Just Work.
* Being able to plan is essential for enterprise use. I can't build a system around AMD if AMD might stop supporting their platform on 0 days notice, and the next day, there might be a security exploit which requires a version bump.
I'm hoping Intel gets their act together here, since NVidia needs a credible competitor. I've given up on AMD.
PTX does provide a low level machine abstraction. However you still target some version of hardware ( https://arnon.dk/matching-sm-architectures-arch-and-gencode-... ). However a lot of software effort has gone into it to make it look and work seamlessly.
Though AMD doesn't have the same "virtual ISA" as PTX right now there are increasing levels of such abstraction available in compiled flows with MLIR / Linalg etc. Those are higher level and can be compiled / jitted in realtime to obviate the need for a low level virtual ISA.
We already fought and lost this battle with 3D APIs for GPUs. What makes you think that winning strategy would play out any other way for tensor processing?
For context, the submitter of the issue is Anush Elangovan from AMD who's recently been a lot more active on social after the SemiAnalysis article, and taking the reigns / responsibility of moving AMD's software efforts forward.
However you want to dissect this specific issue, I'd generally consider this a positive step and nice to see it hit the front page.
https://www.reddit.com/r/ROCm/comments/1i5aatx/rocm_feedback...
https://www.reddit.com/user/powderluv/
hey thats me. Happy to help answer anything here and look forward to your constructive feedback to make AMD software better. We got work to do and look forward to it.
Ok, why does running koboldcpp with a "BLAS Batch Size" of 512 via Vulkan on an RX570 crash my entire computer? You know, to the point where I manually have to turn it on again.
I personally couldn't think of a better reason to never buy AMD GPUs ever again by the way.
I have experience running 130,000 RX470/570/480/580... if you're doing heavy workloads, those things full machine crash if you breathe on them wrong. That said, when they do run, they run extremely well.
There is 1000 reasons why your one GPU could have crashed, what does it say in the logs before it crashed?
Also know as the AMD representative who recently argued with Hotz about supporting tinycorp.
Is that a bad thing? Good for him to stand up to extortion.
Hard to say from my perspective.
I think AMDs offer was fair (full remote access to several test machines), then again just giving tinycorp the boxes on their terms with no strings attached as a kind of research grant would have earned them some goodwill with that corner of the community.
Either way both parties will continue making controversial decisions.
It isn't hard. We offered as well. Full BIOS access even.
Another neocloud, that is funded directly by AMD, also offered to buy him boxes. He refused. It had to come from AMD. That's absurd and extortionist.
Long thread here: https://x.com/HotAisle/status/1880467322848137295
To add, AMD only makes _parts_ of an MI300X server.
It's like asking a tire manufacturer to give you a car for free.
Great analogy!
Just uploaded some pictures of how complex these machines really are...
https://imgur.com/gallery/dell-xe9860-amd-mi300x-bGKyQKr
He explained the reasoning:
> Now, why don't they send me the two boxes? I understand when I was asking for firmware to be open sourced that that actually might be difficult for them, but the boxes are on eBay with a simple $$ cost. It was never about the boxes themselves, it was a test to see if software had any budget or power. And they failed super hard
I know this is someone else's reasoning, so you can't answer this question, but, doesn't this just test if they want to spend the budget on this specific thing?
If I ask a company for a $100,000 grant, and they're not willing, it doesn't seem like correct logic to assume that means they don't have the budget for it. Maybe they just don't want to spend $100,000 on me.
Why does this mean they don't have a budget or power?
He assumes the software department wants to do this, which - yes - seems to be flawed logic on his side.
Let's imagine he's indeed correct. He receives the hardware, get's hacking and solves all of AMDs problem, the stock surges and tinygrad becomes a major deep learning framework.
That would be a collosal embarrassment for AMDs software department.
They should be more interested in selling product than ego
"and they failed" from his PoV... but not from us looking at things from the other side of the table.
Chip vendors regularly send out free hardware to software developers. In this case I don't think the cost is the issue; AMD simply doesn't want what Geohot is offering.
Considering that AMD is only really supporting their datacenter GPUs with ROCm, this is the worst possible response. It means compute on AMD GPUs is only meant for the elite of the elite and forever out of reach for the average consumer and that Nvidia is not only outcompeting AMD on quality but also on cost.
> He refused. It had to come from AMD. That's absurd and extortionist.
I'm on the wrong side of the Twitter wall to read the source, but that doesn't sound absurd. Extortionist, maybe. Hotz's major complaint (last time I checked, anyway) is pretty close to one I have - AMD appears to have between little and no strategic interest in consumer grade graphics cards having strong GPGPU support leading to random crashes from the kernel drivers and a certain attitude of "meh, whatever" from AMD corporate when dealing with that.
I doubt any specific boxes or testing regime are his complaint, he'd be much more worried about whether AMD management have any interest in companies like his succeeding. Third parties providing some support doesn't sound like it'd cut it. The process of being burned by AMD leaves one a little leery of any alleged support without some serious guarantees that more major changes are afoot in their management view.
> ...he'd be much more worried about whether AMD management have any interest in companies like his succeeding.
This reads as incredibly entitled. AMD owes him nothing, especially if he's opposed to the leadership's vision[1] and being belligerent about it.
There is maybe 1 or 2 companies with enough cachet to demand management changes at a supplier like AMD - and they have market caps in the trillions.
1. Lisa Su hasn't been shy about AMD being all about partnering with large partners who can move volume. My interpretation of this is AMD prefers dealing with Sony, Microsoft, hyperscalers, and HPC builders, then possibly tier II OEMs. Small startups are probably much further down the line, close to consumers at the tail end of AMD's attention queue. I don't like it as a consumer, but it seems like a sound strategy since the partners will shoulder most of the software effort, which is a weakness AMD has against Nvidia. They can focus on cranking out ok-to-great hardware at more-than-ok prices and build up a warchest for future investments, and who knows when this hype bubble will burst and take VC dollars with it, or someone invents an architecture that's less demanding on compute (if you're more optimistic)
AMD owes us (its customers) a lot for all the empty and broken promises on this over the many many years and hardware generations.
Sure. But we hear a lot about Hotz because all the unentitled people rolled their eyes and went over to buy Nvidia cards. He's one of the major voices who are unreasonable enough to pipe up on Twitter and air dirty laundry.
I doubt AMD are going to listen to him. They're in a great spot and are probably going to tap into the market in a big way. But Hotz isn't crazy to test them in an odd way - although he'd probably be better off dropping AMD cards like most other people in his price range would.
> But Hotz isn't crazy to test them in an odd way..
He should have just read the Lisa Su interview from Q1 2024 where ahe laid out AMDs strategy without equivocating
> ... although he'd probably be better off dropping AMD cards
I think this is what's best for everyone. Looking at his recent track record[1], he seems like a person who's gets really excited by kicking things off and experiencing the exponentially growth phase, and then when it flattens out into a sigmoid curve, he dusts his hands and declares his work done, and moves to the next thing.
. 1. Hired by Elon to "fix" Twitter, CommaAI, and soon, Tiny
> Looking at his recent track record[1]
One might argue he's had a pattern for even longer. While he did do some early hypervisor glitching, even his PS3 root key release was basically just applying fail0verflow's ECDSA exploit (fail0verflow didn't release the keys specifically because they didn't want to get sued ... so that was a pretty dick move [1]).
For his projects, I think it's important to look at what he's done that's cool (eg, reversing 7900XTX [2], creating a user-space driver that completely bypasses AMD drivers for compute [3]) and separating it from his (super cringe) social media postings/self-hype.
Still, at the end of the day, here's hoping that someone at AMD realizes that having terrible consumer and workstation support will basically continue to be a huge albatross/handicap - it cuts them off basically all academic/research development (almost every single ML library and technique you can name/used in production is CUDA first because of this) and the non-hyperscaler enterprise market as well. Any dev can get a PO for a $500 Nvidia GPU (or has one on their workstation laptop already). What's the pathway for ROCm? (honestly, if I were in charge, my #1 priority would be to make sure ROCm is installed and works w/ every single APU installed, even the 2CU ones).
[1] https://en.wikipedia.org/wiki/Sony_Computer_Entertainment_Am...
[2] https://github.com/tinygrad/7900xtx
[3] https://github.com/tinygrad/tinygrad/blob/master/docs/develo...
Isn't he still actively leading and promoting Comma?
No. https://geohot.github.io//blog/jekyll/update/2022/10/29/the-...
That post is from 2022 saying he's "taking some time away" and it's been "some time" since then.
He was just at CES promoting Comma: https://youtu.be/GLGuA2qF3Kk
I don't really see why those companies would prefer AMD over Nvidia, they are not hurting for money and therefore able to spend that money on Nvidia or build their own hardware, like Google did.
Meta and Microsoft are big enough they could just build their own TPUs with a stable software stack and cut off Nvidia and AMD at the same time.
From this perspective, AMD only ever makes sense as an "also ran company" for a few niche use cases.
> This reads as incredibly entitled. AMD owes him nothing, especially if he's opposed to the leadership's vision[1] and being belligerent about it.
A generation ago, everyone in sales and developer relations understood that "the customer is always right". Remember a sweaty dude on stage jumping about screaming "developers! developers! developers"? It was exhausting dealing with all the free software and hardware sent to developers, not to mention the endless free conferences for even the most backwater developer community. But that's an ethos for boomers, I guess.
On the one hand "incredibly entitled" and on the other you talk about AMD's leadership vision. Your long closing paragraph shows that entitlement of a developer has nothing to do with anything and isn't relevant in the conversation (I can show you guys at OEMs who are incredibly arrogant and entitled or outright a$$holes but so what?). It's just an opinion based on your personal bias.
In reality, AMD simply doesn't care about small AI startups or developers as you've noted. They don't care about me wanting to run all my AI locally so that I can manage my dairy farm with a modest fleet of robots. If they cared, and they sent him MI300s immediately (or sent them to the other 8 startups that asked for them), you wouldn't be chastising him about being "incredibly entitled".
> AMD appears to have between little and no strategic interest in consumer grade graphics cards having strong GPGPU support leading to random crashes from the kernel drivers and a certain attitude of "meh, whatever" from AMD corporate when dealing with that.
AMD has little interest in software support in general.
Their Adrenalin software is riddled with bugs that have been here for years.
Having watched some of his streams on the topic, I think you've captured it well. He's basically saying he's done wasting time on AMD unless/until they get serious. It's not so much that he wants free hardware from them, rather he wants to see them put some skin in the game as they basically blew him off the last time he tried to engage with them.
> He's basically saying he's done wasting time on AMD unless/until they get serious.
They are serious, they just don't respond to his demands.
Or anyone else for that matter, they simply do not care about software.
We do care about software and acknowledge the gaps and will work hard to make it better. Please let me know any specific issues that are an issue for you and Im happy to push for it to get resolved or come back with why it isn't.
... they do now thanks to Anush taking the reigns.
Maybe he needs the AMD brand for his fundraising.
AMD's offer was more than fair. Hotz was throwing a trantrum.
"I estimate having software on par with NVDA would raise their market cap by 100B. Then you estimate what the chance it that @__tinygrad__ can close that gap, say it's 0.1%, probably a very low estimate when you see what we have done so far, but still...
That's worth 100M. And they won't even send us 2 ~100k boxes. In what world does that make sense, except in a world where decisions are made based on pride instead of ROI. Culture issue."
https://x.com/__tinygrad__/status/1879620242315317304
This is his opinion, nothing more, nothing less. He currently has a partially implemented piece of software that hasn't seen a release since November and isn't performant at all.
Take the free offer, prove everyone wrong and then start to tell us how great you are. https://x.com/HotAisle/status/1880507210217750550
To be fair, having seen his software evolve, and having seen ROCm evolve, I'm more optimistic for his software in a year than yours.
He picked his problem better. The whole reason that tinygrad is, well, tiny, is that it limits the amount of overhead to onboard people and perform maintenance and rewrites. My strong impression is that the ROCm codebase is simply much too large for AMD's dev resources. You're trying to race NVidia on their turf with less resources. It's brave, but foolish.
I can see how Tinygrad could succeed. The story makes sense. AMD's doesn't, neither logically nor empirically. NVidia would have to seriously fumble.
>NVidia would have to seriously fumble.
Worked for AMD in the CPU market.
That said I'm deeply worried about anyone whose based their company on amd gpus. The only reason why they do well in hpc is because there's an army of dreadfully underpaid and over performing grand students to pick up the slack from AMD. Trying to do that in a corporate environment is company suicide.
> That said I'm deeply worried about anyone whose based their company on amd gpus
Sony Interactive and Microsoft XBox seem to be doing great without an army of underpaid students. AMD does great at the top and bottom: the corporates in the middle that are unwilling or unable to pay people to author/tweak their software for AMD GPUs will do better going with Nvidia, which has great OOTB software, and a premium to go with it.
I suppose if AMD had infinite resources, it'd fix this post-haste.
That's for gaming though, which AMD/ATi has decades of experience in.
TSMC is more responsible for AMD's success vs. Intel than AMD is. Unfortunately for AMD, Nvidia uses TSMC too.
3D-Cache blows Intel out of the water and has absolutely nothing to do with TSMC. Same goes for the clever chiplet design.
This is false. 3D VCache is enabled by TSMC's 3DFabric packaging. It also didn't really play a role in AMD passing Intel. Chiplets are also enabled by TSMC technology, CoWoS.
> 3D VCache is enabled by TSMC's 3DFabric packaging
> Chiplets are also enabled by TSMC technology, CoWoS.
Interesting, my mistake. Thank you for pointing that out!
But AMD decided to use those technologies and Intel decided not to. AMD on TSMC N4 is beating Intel on TSMC N3 because AMD has better designs.
When AMD passed Intel, they hadn't even decided to use TSMC at all yet. Of course now Intel is behind in leveraging TSMC technology. They started late.
AMD is so behind NVidia that it's not even funny. If AMD board had any sense, they'd be carpet-bombing every researcher, AI startup, and random Joes with the latest engineering samples of unreleased top-of-the line products. And giving them a direct line to the engineering team.
This would end up costing maybe tens of millions at most, but the potential return is indeed measured in billions.
And yep, lots of people like geohot are (to put it mildly) eccentric. So deal with it. They are not merely your customers, they are your freaking sales people.
As it is, I work in a startup that does a bit of AI vision-related stuff. I'm not going to even touch AMD because I don't want to deal with divas on the AMD board in future. NVidia is more expensive right now, but they're far more predictable.
carpet-bombing every researcher, AI startup, and random Joes with the latest engineering samples of unreleased top-of-the line products
That doesn't help if the drivers are buggy. AMD needs to send hardware to their own driver developers.
> AMD is so behind NVidia that it's not even funny.
Do you really want all AI hardware and software dominated by a monopoly? We're not looking to "beat" Nvidia, we are looking to offer a compelling alternative. MI300x is compelling. MI355x is even more compelling.
If there is another company out there making a compelling product, send them my way!
People keep forgeting CUDA is not only about AI, graphics matter as well, as does being a polyglot ecosystem, the IDE integration, the graphical debugging tools, the libraries, having a memory model based on C++ memory model, and the last point is quite relevant, as NVidia employs a few key persons from C++ ecosystem that work on the ISO C++ standard (WG21).
Time will tell, no? Transmeta shipped a lot of Crusoes. It was run by brilliant people. It was a “compelling alternative.” Maybe Cerebras is the Transmeta of this race, I don’t know. But. It’s not about making an alternative. It most definitely is about “beating” NVIDIA. Otherwise, you are just shoveling dollars - shareholders’, undercompensated employees at AMD and TSMC, etc. - to Meta, like everyone else.
The current ASIC's all fail in the memory game, they are not compelling. Cerebras is even more unavailable than AMD!
> It most definitely is about “beating” NVIDIA.
Hard disagree, but we are just going to have to agree to disagree on that.
It's not my job to reformat the entire AI market.
I'm willing to try AMD, and I even built an AMD-based machine to experiment with AI workflows. So far it has been failing miserably. I don't care that MI300X is compelling when I can't make samples work both on my desktop and on a cloud-based MI300X. I don't care about their academic collaborations, I'm not in the business of producing papers.
I'll just pay for H100 in the cloud to be sure that I will be able to run the resulting models on my 3090 locally and/or deploy to 4090 clusters.
If AMD shows some sense, commits to long-term support for their hardware with reasonable feature-parity across multiple generations, I'll reconsider them.
And AMD has a history of doing that! Their CPU division is _excellent_, they are renowned for having long-term support for motherboard socket types. I remember being able to buy a motherboard and then not worrying about upgrading the CPU for the next 3-4 years.
> I'm willing to try AMD, and I even built an AMD-based machine to experiment with AI workflows. So far it has been failing miserably. I don't care that MI300X is compelling when I can't make samples work both on my desktop and on a cloud-based MI300X.
Anush was actively looking for feedback on this on github today...
https://www.reddit.com/r/ROCm/comments/1i5aatx/rocm_feedback...
https://github.com/ROCm/ROCm/discussions/4276
I have quad w7900s under my desk that work well for workloads on my desktop that translate well to MI300x. There are some perf gaps with FAv2, and FP8 but otherwise I get a seamless experience. lmk if you have a pointer to any github issues for me to track down to make your experience better.
I would really like to see a concrete, legit way to materialize a "100M raise in market cap" into actual ROI ...
When the market cap rises, price of shares goes up? Do you know what a market cap is?
Yes, but the company doesn't get more money from that. The only, way to get money out of it is by selling shares at the new price.
However it would also raise future revenue, which should be what's reflected by the market.
So it would still be something that's good for the company, but not nearly 100B good.
You dont think AMD being competitive with Nvidia (3,37 trillion USD MC) would be "nearly 100B good"? Believe it or not the only reason thats not the case is good bug-free software. Thats what tinygrad is doing
AMD already has major ongoing projects with OpenXLA/IREE. Lots of established engineers/researchers, and it’s in collaboration with Google/AWS. Hotz is delusional if he thinks that he can do better by ripping off Karpathy’s toy autograd implementation.
> AMD already has major ongoing projects with OpenXLA/IREE.
And how's that been going? The AMD stock price compared to NVidia seems to speak volumes about the efficacy of these projects.
IREE has been around for 5 years, without producing anything overtly practical. They seem to be focused more on academic jobs and citations. It's also focused on the general case of a compiler for "all" AI-type tasks, supporting everything from WASM to CUDA.
OpenXLA seems to be a bit more practical, but I spent the last 2 hours trying to make it work on my AMD card (Radeon Pro W7900) and failing.
I personally don't like Tinygrad's approach of doing their own thing rather than integrating into PyTorch/JAX/..., but it at least is _practical_ with a reasonable end-goal. Is it going to be successful? Who knows. But it's more practical than anything AMD has done within the recent 5 years.
I am an ML scientist, my company and several others are using IREE to deploy our models to edge devices. It is the most promising technology in this area.
Those academic publications are a sign that the people involved actually know what they’re doing, and are making sure their work holds up to scrutiny.
Yeah, AMD is already pouring a lot of support into OpenXLA/IREE, which has a lot of well-respected compiler engineers and researchers working on it, and companies like AWS are also investing into it.
I don’t really think TinyCorp has anything to offer AMD.
Offering software support in exchange for payment is extortion?
It is far more complex than that.
Complex how? He requested payment in the form of MI300X servers, which is unconventional, sure, but the value of the payment is not out of line with the support he proposed to provide IMO.
[dead]
Which SemiAnalysis article?
https://semianalysis.com/2024/12/22/mi300x-vs-h100-vs-h200-b...
Really telling they have to ask us for what cards we want as opposed to supporting all cards by default from day 1 like Nvidia.
All because they went with a boneheaded decision to require per-device code compilation (gfx1030, gfx1031...) instead of compiling to an intermediate representation like CUDA's PTX. Doubly boneheaded considering the graphics API they developed, Vulkan, literally does that via SPIR-V!
Really telling who comments before reading :)
The author of the issue comments that they'll eventually support all cards. What he really is asking for, is what cards people want them to prioritize, not just support.
I read it fully. Whole point of my post is that, based on their track record so far plus the technical limitations, it is impossible for AMD to provide the same day 1 drop in compatibility that the CUDA ecosystem offers.
Edit:
> No guarantees of future support but we will try hard to add support.
yes. We are behind on software support for all consumer cards and would love to support all cards. But are looking for guidance / feedback so we can prioritize.
This line sparks no confidence:
> No guarantees of future support but we will try hard to add support.
AMD reps told me exactly the same thing years ago about how they'd love to support all cards, when RDNA2 had just launched. Fast forward, only W6800 is properly supported from that gen. The last time I tried, it had tons of kernel bugs that caused hard freezes outside most basic cases.
You need to come out and say that you will support all cards, no ifs or buts, by a hard deadline.
I can understand wanting to prioritize support for the cards people want to use most, but they should still plan to write software support for all the cards that have hardware support.
I've long since given up on my 5700xt getting supported. AMD is just not a good pick if you care about non graphics compute.
If you use Debian libraries then it will work. eg:
https://github.com/superjamie/rocswap
I ran this on an 5600 XT, just recently switched to nVidia.
Imagine Nvidia not supporting CUDA on any of their cards. Unthinkable.
Nvidia takes a software first approach and AMD takes a hardware first approach.
It is clear that AMD's approach isn't working and they need to change their balance.
I've always described Nvidia as an accelerated compute company that happens to sell hardware.
AMD are smart, and they solve big problems in ways that are baffling to many. They're very sensitive to moats and position themselves with products or frameworks to drain them.
I consider their primary product "engineering competence as a service", but when no one external picks up the reigns, they don't try very hard to play market maker. I remember when Intel's R&D budget was more than AMD's market cap– they're effective both at and when running lean.
The reality here is that people don't have grievances with CUDA and Nvidia aren't doing anything egregious with it. But whether that's due to ROCm's existence... we can only speculate.
> The reality here is that people don't have grievances with CUDA and Nvidia aren't doing anything egregious with it.
Correct. Lots of people also developed specifically for Internet Explorer too.
They are a monopoly and if that is important to you, then you'll want alternative solutions to avoid putting all your eggs in one basket.
People have short term memory loss and forget that just a few months ago, H100's were impossible to get and the price skyrocketed. Given the "insane demand" of Nvidia compute (and compute in general), these sorts of supply/demand issues will be indefinitely ongoing. How many times will people need to get burned until they start to seek alternatives? Hard to say...
Hardware first, but then their hardware isn't any better than NVidia's, so I don't see how that's a valid excuse here.
(Okay, maybe their super high end unobtanium-level GPUs are better hardware-wise. Don't know, don't care about enterprise-only hardware that is unbuyable by mere mortals.)
It's just not, people like to try and defend AMD out of hatred for Nvidia but the thousands of fumbles over the past 15 years that have led AMD to their current position and Nvidia to their current dominance are not deserving of coddling and excuses.
The fact support still isn't there, they've had 2 years since Stable Diffusion to get a serious team up and shipping and they still don't even have enough resources pointed at this to not have to be asking what should be prioritized.
The only way to fix their culture/priorities is to stop buying their cards.
Some of it isn't unbuyable... it is just expensive. https://www.ebay.com/itm/305850340813
But that's why my business exists... https://news.ycombinator.com/item?id=42759191
this is a-posteriori development.. we have no idea of how hard it is to implement with older GPUs
People set up Stable Diffusion with automatic1111 and rocm for all kinds of weird setups successfully. What AMD needs to do is basically just provide a better out of the box experience, as even following the other people's instructions have been flaky at best. For example, for my 6600 XT, I have tried setting up SD twice. I succeeded in Manjaro in the past (like, a year ago), but didn't succeed now, and I succeeded in Debian now, but it uses the CPU for some reason. Hardware setup was the same, the only thing that changed is that I have updated my Linuxes in the meantime.
rocm is kind of a joke. Recently I wanted to write some golang code which talks to rocm devices using amd smi. You have to build and install the go amd smi from source, the go amd smi repo has dead links and there is basically no documentation anywhere on how to get this working.
Compare this to nvidia where I just imported the go nvml library and it built the cgo code and automatically links to nvidia-ml.so at runtime.
Is this the repo you are referring to https://github.com/amd/go_amd_smi ? Would having a prebuilt version there help you ?
“ * NOTE: * The GO SMI binding depends on the following libraries:
- E-SMI inband library ("https://github.com/amd/esmi_ib_library") - ROCm SMI library("https://github.com/ROCm/rocm_smi_lib") - AMDSMI library("https://github.com/ROCm/amdsmi") - goamdsmi_shim library ("https://github.com/amd/goamdsmi/goamdsmi_shim")”
First of all this link is dead: https://github.com/amd/goamdsmi/goamdsmi_shim
Second: these dependencies should all be packaged into deb/rpm
Third: there should be a goamdsmi package which has a proper dependency tree. I should be able to do ‘apt-get install goamdsmi’ and it should install everything I need. This is how it works with go-nvml.
AMD supports only a single Radeon GPU in Linux (RX 7900 in three variants)?
Windows support is also bad, but supports significantly more than one GPU.
Imagine nvidia supported only the 4090, 4080 and 4070 for cuda at the consumer level. With the 3090 not being supported since the 40xx series came out. This is what amd is defending here.
Super annoying. I have an RX 6600 XT and can't get ROCm to work on Linux. Vulkan ML however worked perfectly out of the box, so at least I got something.
Just weird the official thing doesn't work.
Use the Debian libraries, it works:
https://github.com/superjamie/rocswap
The caveat being that PyTorch has a lot of dependencies and a couple of them are not yet available in Debian Unstable. For folks wanting to use StableDiffusion, that's a problem. However, the available packages are more than sufficient for llama-cpp as you point out.
I honestly can't figure out which Radeon GPUs are supposed to be supported.
The GitHub discussion page in the title lists RX 6800 (and a bunch of RX 7xxx GPUs) as supported, and some lower-end RX 6xxx ones as supported for runtime. The same comment also links to a page on the AMD website for a "compatibility matrix" [1].
That page only shows RX 7900 variants as supported on the consumer Radeon tab. On the workstation side, Radeon Pro W6800 and some W7xxx cards are listed as supported. It also suggests to see the "Use ROCm on Radeon GPU documentation" page [2] if using ROCm on Radeon or Radeon Pro cards.
That link leads to a page for "compatibility matrices" -- again. If you click the link for Linux compatibility, you get a page on "Linux support matrices by ROCm version" [3].
That "by ROCm version" page literally only has a subsection for ROCm 6.2.3. It only lists RX 7900 and Pro W7xxx cards as supported. No mention of W6800.
(The page does have an unintuitively placed "Version List" link through which you can find docs for ROCm 5.7 [4]. Those older docs are no more useful than the 6.2.3 ones.)
Is RX 6800 supported? Or W6800? Even the amd.com pages seem to contradict each other on the latter.
Maybe the pages on the AMD site only list official production support or something. In any case it's confusing as hell.
Nothing against the GitHub page author who at least seems to try and be clear but the official documentation leaves a lot to be desired.
[1] https://rocm.docs.amd.com/projects/install-on-linux/en/lates...
[2] https://rocm.docs.amd.com/projects/radeon/en/latest/docs/com...
[3] https://rocm.docs.amd.com/projects/radeon/en/latest/docs/com...
[4] https://rocm.docs.amd.com/projects/radeon/en/docs-5.7.0/docs...
I will provide this feedback to the docs team to clean up. I found it hard when i was making that Poll :D but I looked harder instead of trying to fix the docs. So thank you for the feedback.
> I honestly can't figure out which Radeon GPUs are supposed to be supported.
Exactly.
I have a 6700 XT with 12 gig ram and a 5700 with 8 gig ram.
If i ctrl+f for either of those numbers on the GH issue, I get one hit. For the 6700, it's a single row that has a green check for "runtime" and a red x for "HIP SDK". For the 5700 card, it's somebody in the peanut gallery saying "don't forget about us!".
HIP is the c++ "flavor" that can compile down to work on amd _and_ nvidia gpus. If the 6700 has support for the "runtime" but not HIP ... what does that even mean for me?
And as you pointed out, the 6800 series card has green checks for both so that means it's fully supported? But ... it's not listed on AMD's site?!
Bad docs are how you cement a reputation of "just buy nvidia and install their latest drivers and it'll be fine".
I think the matrix shown in the github issue is for Windows support, which is much better: https://rocm.docs.amd.com/projects/install-on-windows/en/lat...
Having said that, on the weekend I set up ROCm on Linux on my 6800XT and it seems to work just fine.
Removing support for Radeon VII is a bonehead move that smacks of stupidity or greed. The cards were targeted for enthusiast gamers but have enterprise level hardware, like HBM2 memory and 1 TB/s bandwidth.
I found that striking as well. Does AMD expect everyone wanting to try out PyTorch or LLMs on Linux to splurge on Instinct servers?
ROCm on Radeon should work too and the poll above was to seek feedback on what to cards to support next.
Add support for every APU. They can have much more RAM than discrete graphics.
Why are people in AMD assuming other people don't want more software support for their GPUs by default? This is not nice.
Because they don't have infinite resources like nVidia so they're asking what people want the most to prioritise it.
Please read the link before commenting on future. We do that here. This info is is an early comment by an AMD employee.
It's not nice to assume that people don't read then proceed to comment.
I read the link and I upvoted the "just support all GPUs you recently produced" comment.
I don't think the solution to bad software support is the prioritization. The prioritization is causing even more discrimination among different GPUs and different customers.
You can say whatever you want, and downvote whatever you want. However, that doesn't solve the real problem.
A lot of people think rocm is basically a big pile of crap.
What are the chances for amd to consider alternatives: - adopt oneapi and try to fight Nvidia together with intel - Vulkan and implement pytorch backend - sycl
I figure that list is only what’s officially supported, meaning things not on that list may or may not work?. For example, my 6800 XT runs stable diffusion just fine on Linux with PyTorch ROCm.
What’s the performance like? Was it easy to set up?
I cannot compare the performance with other cards, but it takes a few seconds for SDXL images (e.g. 1024x512) as long as it doesn’t run OOM.
I use a fork of the stable diffusion webui [0] which, for me, handled memory better. Setup was relatively easy: install the pytorch packages from the ROCm repo and it worked.
[0]: https://github.com/lllyasviel/stable-diffusion-webui-forge
They should just support all cards. Just like Nvidia does.
And they drop support too quickly too. The Radeon Pro VII is already out of support. It's barely 5 years since release.
This way it will never be a counterpart to CUDA.
I’m constantly baffled and amused on why AMD keeps majorly failing at this.
Either the management at AMD is not smart enough to understand that without the computing software side they will always be a distant number 2 to NVIDIA, or the management at AMD considers it hopeless to ever be able to create something as good as CUDA because they don’t have and can’t hire smart enough people to write the software.
Really, it’s just baffling why they continue on this path to irrelevance. Give it a few years and even Intel will get ahead of them on the GPU side.
If I were Jensen, I would snap up all the GPU software experts I possibly could, and put them to work improving the CUDA ecosystem. I'd also spin up a big research group to further fuel the CUDA pipeline for hardware, software, and application areas.
Which is exactly what NVIDIA seems to be doing.
AMD's ROCm software group seems far behind, is probably understaffed, and probably is paid a fraction of what NVIDIA pays its CUDA software groups.
AMD also has to catch up with NVlink and Spectrum-X (and/or InfiniBand.)
AMD's main leverage point is its CPUs, and its raw GPU hardware isn't bad, but there is a long way to go in terms of GPU software ecosystem and interconnect.
I've never understood why they have such a fractured approach to software:hardware support. I remember reading and writing comments about this on hn nearly a decade ago now. It's a long time to keep making the same mistake.
They had the exact same kind of support issues back in the OpenCL days, where they didn't manage to provide cross platform, cross card support for same versions of the platform.
I have never been able to reconcile it with their turnaround and newfound competence on the CPU side.
> I’m constantly baffled and amused on why AMD keeps majorly failing at this.
i wonder if you've considered the possibility that there's some component/dimension of this that you're simply unaware of? that it's not as straightforward as whatever reductive mental model you have? is that even like within the universe of possibilities?
I mean, they did say they were baffled. I'd say that probably includes "I don't know"
My wishlist for ROCm support is actually supporting the cards they already released. But that's not going to happen.
By the time an (consumer) AMD device is supported by ROCm it'll only have a few years of ROCm support left before support is removed. Lifespan of support for AMD cards with ROCm is very short. You end up having to use Vulkan which is not optimized, of course, and a bit slower. I once bought an AMD GPU 2 years after release and 1 year after I bought it ROCm support was dropped.
FWIW, every ROCm library currently in the Debian 13 'main' and Ubuntu 24.04 'universe' repository has been built for and tested on every discrete consumer GPU architecture since Vega. Not every package is available that way, but the ones that are have been tested on and work on Vega 10, Vega 20, RDNA 1, 2 and 3.
Note that these are not the packages distributed by AMD. They are the packages in the OS repositories. Not all the ROCm packages are there, but most of them are. The biggest downside is that some of them are a little old and don't have all the latest performance optimizations for RDNA 3.
Those operating systems will be around for the next decade, so that should at least provide one option for users of older hardware.
Packages existing and the software actually working are very different things. You can run rocm on unsupported GPUs like a 780m, but as soon as you hit an issue you are out of luck. And you’ll hit an issue.
For example, my 780m gets 1-2 inferences from llama.cpp before dropping off the bus due to a segfault in the driver. It’s a bad enough lockup that linux can’t cleanly shutdown and will hang under hard rebooted.
The 780m is an integrated GPU. I specified discrete GPUs because that's what I have tested and can confirm will work.
I have dozens of different AMD GPUs and I personally host most of the Debian ROCm Team's continuous integration servers. Over the past year, I have worked together with other members of the Debian project to ensure that every potentially affected ROCm library is tested on every discrete consumer AMD GPU architecture since Vega whenever a new version of a package is uploaded to Debian.
FWIW, Framework Computers donated a few laptops to Debian last year, which I plan to use to enable the 780m too. I just haven't had the time yet. Fedora has some patches that add support for that architecture.
I can confirm this, Debian's ROCm distribution worked great for me on some "unsupported" cards.
As the underdog AMD can't afford to have their efforts perceived as half-assed or a hobby or whatever. They should be moving heaven and earth to maximize their value proposition, promising and delivering on longer support horizons to demonstrate the long term value of their ecosystem.
Honestly at this point half-assed support would be a significant step up from their historical position. The one thing they have pioneered is new tiers of fractional assedness asymptotically approaching zero.
I mean at this point my next card is going to be an nvidia. It has been a total waste of time trying to use rocm for anything machine-learning based. No one uses it. No one can use it. The card I have is somehow always not quite supported.
We go from:
Support is coming in three months!
To
This card is ancient and will be no longer developed for. Buy our brand new card released in three months!
Every damned time.
Seeing Radeon VII on the deprecation list is a little saddening, unless they start putting out more 16gb+ GPUs that aren't overly expensive...
They should have at a minimum 5 year support release cycle.
It kinda seems like they do - 5 years would only include the RX 6xxx and 7xxx.
5 years is not very long tbh.
RX 7800 XT was supported for 15 months before being dropped. Significantly less than 5 years.
True but business hardware (and home for that matter) often goes on 3-5 year cycles though. At 5 years it's kinda expected hardware will get replaced.
It doesn’t work for the first three years, so it’s two years in practice.
I have a mi50 with 16gb of hbm thats collecting dust (its Vega bases, so it can play games, I guess) because I don’t want to bother setting up a system with Ubuntu 20.04, the last version of Ubuntu the last version of ROCM that supported the MI50 works on.
With situations like this, its not hard to see why Nvidia totally dominates in the compute/ai market.
The MI50 may be considered deprecated in newer releases, but it seems to work fine in my experience. I have a Radeon VII in my workstation (which shares the same architecture) and I host the MI60 test machine for Debian AI Team. I haven't had any trouble with them.
I had the impression Debian applied patches that widen arch support from what upstream officially supports, including for the MI50/MI60.
https://salsa.debian.org/rocm-team/rocm-hipamd/-/raw/d6d2014... (one patch of many)
I wrote that patch. It's not actually used for MI50/MI60 in any of the Debian system packages, since Debian builds for gfx906 rather than using the gfx900 fallback path that patch provides. Debian is not relying on any special patches to enhance gfx906 support. That architecture is the same as upstream.
Now, for some other GPU architectures, you're absolutely right. There are indeed important patches in Debian that enable its extra-wide hardware compatibility.
Thanks for all your work on this.
I don’t think the mi60 has reached deprecated status yet (the last time I look at prices for the mi50 and mi60, the mi60 was something like 3x expensive, and I think thats because its still officially supported), but I’ll check this all out. Thanks.
The MI60 is basically just a faster MI50 with more memory. They were deprecated together. It's plausible there could be small firmware or driver differences that cause issues in one but not the other, but I think that's unlikely.
AMD did over $5 billion in GPU compute (Instinct line) last year. Not nVidia numbers but also not bad. Customers love that they can actually get Instinct system rather than trying to compete with the hyperscalers for limited supplies of nVidia systems. Meta and Microsoft are the two biggest buyers of AMD Instincts, though...
AMD Instinct is also more power efficient and has comparable (if not better) performance for the same (or less) price.
Meta and Microsoft buys hundreds of thousands of Nvidia accelerators a year, and are a big reason why everyone else has to compete for nvidia units.
AMD has separate architectures for GPU compute (Instinct https://www.amd.com/en/products/accelerators/instinct/mi300....) and consumer video (Radeon).
AMD are merging the architectures (UDNA) like nVidia but it's not going to be before 2026. (https://wccftech.com/amd-ryzen-zen-6-cpus-radeon-udna-gpus-u...)
You can use ROCM on consumer radeon as long as you pay more than 400 dollars for one of their gpus. Meanwhile, you can run stable diffusion with the -lowvram flag on a 3050 6gb that goes for 180 dollars
Really hoping for support for an AMD Radeon Pro W5700 I have kicking around.
As someone from the rendering side of GPU stuff, what exactly is the point of ROCm/CUDA? We already have Vulkan and SPIR-V with vendor extensions as a mostly-portable GPU API, what do these APIs do differently?
Furthermore, don't people use PyTorch (and other libraries? I'm not really clear on what ML tooling is like, it feels like there's hundreds of frameworks and I haven't seen any simplified list explaining the differences. I would love a TLDR for this) and not ROCm/CUDA directly anyways? So the main draw can't be ergonomics, at least.
Vulkan doesn't do C++ as shading language for example, there are some backend attempts to target SPIR-V, but it still is early days and nowhere close of having the IDE integration, graphical debugging tools and rendering libraries that CUDA enjoys.
Examples of rendering solutions using CUDA,
https://www.nvidia.com/en-us/design-visualization/solutions/...
https://home.otoy.com/render/octane-render/
It is definitely ergonomics and tooling.
users mainly use PyTorch and Jax and these days rarely write CUDA code.
however separately, installing drivers and the correct CUDA/CuDNN libraries is the responsibility of the user. this is sometimes slightly finicky.
with ROCm, the problem is that 1) PyTorch/Jax don't support it very well, for whatever reason which may be partly to do with the quality of ROCm frustrating PyTorch/Jax devs, 2) installing drivers and libraries is a nightmare. it's all poorly documented and constantly broken. 3) hardware support is very spotty and confusing.
PyTorch and Jax, good to know.
Why do they have ROCm/CUDA backends in the first place though? Why not just Vulkan?
it's an interesting question. the unhelpful answer is Vulkan didn't exist when Tensorflow, PyTorch (and Torch, its Lua-based predecessor) were taking off and building GPU support. Apparently PyTorch did at one point prototype a Vulkan backend but abandoned it.
My own experience is that half-assed knowledge of C/C++, and a basic idea of how GPUs are architected, is enough to write a decent custom CUDA kernel. It's not that hard to do. No idea how I would get started with Vulkan, but I assume it would require a lot more ceremony, and that writing compute shaders is less intuitive.
there is also definitely a "worse is better" effect in this area. there are some big projects that tried to be super general and cover all use cases and hardware. but a time-crunched PhD student or IC just needs something they can use now. (even Tensorflow, which was relatively popular compared to some other projects, fell victim to this.)
George Hotz seems like a weird guy in some respects, but he's 100% right that in ML it is hard enough to get anything working at all under perfect conditions, you don't need fighting with libraries and build tools on top of that, or the mental overhead of learning how to use this beautiful general API that supports 47 platforms you don't care about.
except also "worse is better is better" -- e.g. because they were willing to make breaking changes and sacrifice some generality, Jax was able to build something really cool and innovative.
CUDA has first mover advantage, and provides a simpler higher level compute API for library maintainers compared to Vulkan.
Vulkan doesn't do C++, rather GLSL and HLSL, nor has good tooling for the few prototypes that target SPIR-V.
Cuda the language is an antique dialect of C++ with a vectorisation hack. It's essentially what you get if you take an auto-vectoriser and turn off the correctness precondition, defining the correct semantics to be that which you get if you ignore dataflow. This was considered easier to program with than vector types and intrinsics.
Cuda the ecosystem is a massive pile of libraries for lots of different domains written to make it easier to use GPUs to do useful work. This is perhaps something of a judgement on how easy it is to write efficient programs using cuda.
ROCm contains a language called HIP which behaves pretty similarly to Cuda. OpenCL is the same sort of thing as well. It also contains a lot of library code, in this case because people using Cuda use those libraries and don't want to reimplement them. That's a bit of a challenge because nvidia spent 20 years writing these libraries and is still writing more, yet amd is expected to produce the same set in an order of magnitude less time.
If you want to use a GPU to do maths, you don't actually need any of this stuff. You need the GPU, something to feed it data (e.g. a linux host) and some assembly. Or LLVM IR / freestanding c++ if you prefer. This whole cuda / rocm thing really is intended to make them easier to program.
i really need amd to make an apu with eight channels and ddr5
Latest AMD miniPCs APUs have DDR5 memory (up to 96GB). Don't know about the channels.
But this APU hack might work:
https://blog.machinezoo.com/Running_Ollama_on_AMD_iGPU https://github.com/ollama/ollama/pull/6282
Linux since 6.1 made some changes to allocate the memory to the GPU from userspace, but it seems the GTT method has degrated performance even more.