nicdex @nicdex

TroelsPreprint of the longest paper I ever contributed to: <a href="https://arxiv.org/abs/2505.08906" rel="nofollow noopener" translate="no" target="_blank">https://arxiv.org/abs/2505.08906</a> - it is a qualitative and quantitative comparison of various <a href="https://freeradical.zone/tags/functional" class="mention hashtag" rel="nofollow noopener" target="_blank">#functional</a> <a href="https://freeradical.zone/tags/array" class="mention hashtag" rel="nofollow noopener" target="_blank">#array</a> languages, with a significant <a href="https://freeradical.zone/tags/gpgpu" class="mention hashtag" rel="nofollow noopener" target="_blank">#gpgpu</a> element.

रञ्जित (Ranjit Mathew)"Understanding PTX, The Assembly Language Of CUDA GPU Computing", Nvidia (<a href="https://developer.nvidia.com/blog/understanding-ptx-the-assembly-language-of-cuda-gpu-computing/" rel="nofollow noopener" translate="no" target="_blank">https://developer.nvidia.com/blog/understanding-ptx-the-assembly-language-of-cuda-gpu-computing/</a>).<a href="https://mastodon.social/tags/Nvidia" class="mention hashtag" rel="nofollow noopener" target="_blank">#Nvidia</a> <a href="https://mastodon.social/tags/GPU" class="mention hashtag" rel="nofollow noopener" target="_blank">#GPU</a> <a href="https://mastodon.social/tags/CUDA" class="mention hashtag" rel="nofollow noopener" target="_blank">#CUDA</a> <a href="https://mastodon.social/tags/PTX" class="mention hashtag" rel="nofollow noopener" target="_blank">#PTX</a> <a href="https://mastodon.social/tags/AssemblyLanguage" class="mention hashtag" rel="nofollow noopener" target="_blank">#AssemblyLanguage</a> <a href="https://mastodon.social/tags/IntermediateLanguage" class="mention hashtag" rel="nofollow noopener" target="_blank">#IntermediateLanguage</a> <a href="https://mastodon.social/tags/IR" class="mention hashtag" rel="nofollow noopener" target="_blank">#IR</a> <a href="https://mastodon.social/tags/HPC" class="mention hashtag" rel="nofollow noopener" target="_blank">#HPC</a> <a href="https://mastodon.social/tags/GPGPU" class="mention hashtag" rel="nofollow noopener" target="_blank">#GPGPU</a>

ashwinvisit is kind of wild to learn that <a href="https://mastodon.acc.sunet.se/tags/FluidMechanics" class="mention hashtag" rel="nofollow noopener" target="_blank">#FluidMechanics</a> played an integral role in the creation of <a href="https://mastodon.acc.sunet.se/tags/CUDA" class="mention hashtag" rel="nofollow noopener" target="_blank">#CUDA</a> and in turn ushering in an era of <a href="https://mastodon.acc.sunet.se/tags/GPGPU" class="mention hashtag" rel="nofollow noopener" target="_blank">#GPGPU</a> and <a href="https://mastodon.acc.sunet.se/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#AI</a> <a href="https://www.youtube.com/watch?v=K9anz4aB0S0" rel="nofollow noopener" translate="no" target="_blank">https://www.youtube.com/watch?v=K9anz4aB0S0</a><a href="https://discuss.tchncs.de/c/fluidmechanics" class="u-url mention" rel="nofollow noopener" target="_blank">@fluidmechanics</a>

Dr. Moritz LehmannHot Aisle's 8x AMD <a href="https://mast.hpc.social/tags/MI300X" class="mention hashtag" rel="nofollow noopener" target="_blank">#MI300X</a> server is the fastest computer I've ever tested in <a href="https://mast.hpc.social/tags/FluidX3D" class="mention hashtag" rel="nofollow noopener" target="_blank">#FluidX3D</a> <a href="https://mast.hpc.social/tags/CFD" class="mention hashtag" rel="nofollow noopener" target="_blank">#CFD</a>, achieving a peak <a href="https://mast.hpc.social/tags/LBM" class="mention hashtag" rel="nofollow noopener" target="_blank">#LBM</a> performance of 205 GLUPs/s, and a combined VRAM bandwidth of 23 TB/s. 🖖🤯 The <a href="https://mast.hpc.social/tags/RTX" class="mention hashtag" rel="nofollow noopener" target="_blank">#RTX</a> 5090 looks like a toy in comparison.MI300X beats even Nvidia's GH200 94GB. This marks a very fascinating inflection point in <a href="https://mast.hpc.social/tags/GPGPU" class="mention hashtag" rel="nofollow noopener" target="_blank">#GPGPU</a>: <a href="https://mast.hpc.social/tags/CUDA" class="mention hashtag" rel="nofollow noopener" target="_blank">#CUDA</a> is not the performance leader anymore. 🖖😛 You need a cross-vendor language like <a href="https://mast.hpc.social/tags/OpenCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenCL</a> to leverage its power.FluidX3D on <a href="https://mast.hpc.social/tags/GitHub" class="mention hashtag" rel="nofollow noopener" target="_blank">#GitHub</a>: <a href="https://github.com/ProjectPhysX/FluidX3D" rel="nofollow noopener" translate="no" target="_blank">https://github.com/ProjectPhysX/FluidX3D</a>

Giuseppe BilottaFirst day of the <a href="https://fediscience.org/tags/GPGPU" class="mention hashtag" rel="nofollow noopener" target="_blank">#GPGPU</a> course at <a href="https://fediscience.org/tags/UniCT" class="mention hashtag" rel="nofollow noopener" target="_blank">#UniCT</a>. Class is small, but students seem curious, gave me the opportunity to discuss in more details some things that usually go unmentioned. Hopefully it'll hold.Only negative side, I had to take a longer route home because the park between my house and the university was closed 8-(

Giuseppe BilottaI'm getting the material ready for my upcoming <a href="https://fediscience.org/tags/GPGPU" class="mention hashtag" rel="nofollow noopener" target="_blank">#GPGPU</a> course that starts on March. Even though I most probably won't get to it,I also checked my trivial <a href="https://fediscience.org/tags/SYCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#SYCL</a> programs. Apparently the 2025.0 version of the <a href="https://fediscience.org/tags/Intel" class="mention hashtag" rel="nofollow noopener" target="_blank">#Intel</a> <a href="https://fediscience.org/tags/OneAPI" class="mention hashtag" rel="nofollow noopener" target="_blank">#OneAPI</a> <a href="https://fediscience.org/tags/DPCPP" class="mention hashtag" rel="nofollow noopener" target="_blank">#DPCPP</a> runtime doesn't like any <a href="https://fediscience.org/tags/OpenCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenCL</a> platform except Intel's own (I have two other platforms that support <a href="https://fediscience.org/tags/SPIRV" class="mention hashtag" rel="nofollow noopener" target="_blank">#SPIRV</a>, so why aren't they showing up? From the documentation I can find online this should be sufficient, but apparently it's not …)

🅴🆁🆄🅰 🇷🇺Это же <a href="https://www.bilibili.com/opus/1012175160065654784" rel="nofollow noopener" target="_blank">сколько заморочек</a> на NVidia и Windows'ах чтобы поиграть в Го с нейронкой? (с KataGo, аналог AlphaGo). В таких раскладах системы на ATI/AMD и линуксах выглядят разумным выбором. Раз хочется комп, чтобы играть в Го, то берёшь с такой видяхой и такой ОС, на которых меньше всего суеты с использование GPU, точнее #<a class="" href="https://hub.hubzilla.de/search?tag=GPGPU" rel="nofollow noopener" target="_blank">GPGPU</a>. Вот по тегам <a href="https://hub.hubzilla.de/channel/erua?f=&tag=igo" rel="nofollow noopener" target="_blank">что накопилось</a> про игру в Го с компом — ничего принципиально сложного в настройке. Может кому и смешно, а я помню времена, когда пост-советские люди приобретали себе домой шахматные компьютеры. Чтобы сами играть в шахматы и детей приучать, в домах были что-то вроде  «Электроника ИМ-01». Фигуры переставлять не умел и отображал координаты хода только на табло. Такое приобретение было непростой вещью в те времена и в тех условиях. Современность же изобилует разнообразием интеллектуальных развлечений, которые и не сильно востребованы. Однако, найдутся люди рассматривающие десктоп или ноутбук именно с точки зрения средства для игры в оффлайне, а не только в онлайн. Или для анализа своих и чужих партий опять же через нейронные сети для обучения или отработки навыков игры. #<a class="" href="https://hub.hubzilla.de/search?tag=AMD" rel="nofollow noopener" target="_blank">AMD</a> #<a class="" href="https://hub.hubzilla.de/search?tag=ATI" rel="nofollow noopener" target="_blank">ATI</a> #<a class="" href="https://hub.hubzilla.de/search?tag=Nvidia" rel="nofollow noopener" target="_blank">Nvidia</a> #<a class="" href="https://hub.hubzilla.de/search?tag=KataGo" rel="nofollow noopener" target="_blank">KataGo</a> #<a class="" href="https://hub.hubzilla.de/search?tag=games" rel="nofollow noopener" target="_blank">games</a> #<a class="" href="https://hub.hubzilla.de/search?tag=gaming" rel="nofollow noopener" target="_blank">gaming</a> #<a class="" href="https://hub.hubzilla.de/search?tag=%D0%B3%D0%BE" rel="nofollow noopener" target="_blank">го</a> #<a class="" href="https://hub.hubzilla.de/search?tag=igo" rel="nofollow noopener" target="_blank">igo</a> #<a class="" href="https://hub.hubzilla.de/search?tag=baduk" rel="nofollow noopener" target="_blank">baduk</a> #<a class="" href="https://hub.hubzilla.de/search?tag=%D0%B1%D0%B0%D0%B4%D1%83%D0%BA" rel="nofollow noopener" target="_blank">бадук</a> #<a class="" href="https://hub.hubzilla.de/search?tag=weiqi" rel="nofollow noopener" target="_blank">weiqi</a> #<a class="" href="https://hub.hubzilla.de/search?tag=%D0%B2%D1%8D%D0%B9%D1%86%D0%B8" rel="nofollow noopener" target="_blank">вэйци</a> #<a class="" href="https://hub.hubzilla.de/search?tag=lang_ru" rel="nofollow noopener" target="_blank">lang_ru</a> @<a href="https://3zi.ru/@Russia" rel="nofollow noopener" target="_blank">Russia</a>

gigapixelSo I found <a href="https://github.com/tracel-ai/cubecl" rel="nofollow noopener" translate="no" target="_blank">https://github.com/tracel-ai/cubecl</a> which allows <a href="https://mathstodon.xyz/tags/gpgpu" class="mention hashtag" rel="nofollow noopener" target="_blank">#gpgpu</a> in <a href="https://mathstodon.xyz/tags/rust" class="mention hashtag" rel="nofollow noopener" target="_blank">#rust</a>. Already using it to calculate some determinants for triangulations. Wondering if it can be leveraged to build a numeric <a href="https://mathstodon.xyz/tags/pde" class="mention hashtag" rel="nofollow noopener" target="_blank">#pde</a> solver

Anthony CowleyThis look at how many HLSL instructions different variations of an endian swap compile to with AMD tooling is... well, frankly, it's upsetting. All variations are similar, but compile down to anywhere from 1 to 13 IR ops. Contorting code to trigger desired compilation paths is familar to many GHC Haskellers, but it's an incredible deterrent to prioritizing performance. <a href="https://martinfullerblog.wordpress.com/2025/01/13/massaging-the-shader-compiler-to-emit-optimum-instructions/" rel="nofollow noopener" translate="no" target="_blank">https://martinfullerblog.wordpress.com/2025/01/13/massaging-the-shader-compiler-to-emit-optimum-instructions/</a><a href="https://mastodon.social/tags/gpgpu" class="mention hashtag" rel="nofollow noopener" target="_blank">#gpgpu</a>

Replied in thread

**Dr. Moritz Lehmann** @ProjectPhysX@mast.hpc.social · Jan 15

Jan 15

Dr. Moritz Lehmann @ProjectPhysX@mast.hpc.social

@BenjaminHCCarr another article on #GPU code portability where people put their heads in the sand and pretend very hard that #OpenCL doesn't exist...
OpenCL has solved #GPGPU cross-compatibility 16 years ago already and today is in better shape than ever.

Replied in thread

**Kevin Karhan** @kkarhan@infosec.space · Jan 12

Jan 12

Kevin Karhan @kkarhan@infosec.space

@enigmatico @lispi314 @kimapr @bunnybeam case in point:

#Bloatedness was the original post topic and yes, due to #TechBros "#BuildFastBreakThings" mentality, #Bloatware is increasing given that a shitty bloated 50+MB "#WebApp" with like nw.js is easy to slap together (and yes I did so myself!) than to put in way more thought and effort (as you can see on the slow progression of OS/1337...
Yes, #Accessibility is something that needs to be taken more seriously and it's good to see that there's at least some attemots at making #accessibility mandatory (at least in #Germany, where I know from some insider that a big telco is investing a lot in that!) for a growng number of industries and websites...
And whilst one can slap an #RTX5090 on any laptop that has a fully-functional #ExpressCard slot (with #PCIe interface, using some janky adaptors!) that'll certainly not make sense beyond some #CUDA or other #GPGPU-style workloads as it's bottlenecked to a single PCIe lane of 2.0 (500MB/s) or just 1.0a(250MB/s) speeds.

Needless to say there is a need to THINN DOWN things cuz the current speed of #Enshittifcation and bloatedness combined with #AntiRepairDesign and overpriced yet worse #tech in general makes it unsustainable for an ever increasing population!

Not everyone wants (or even can!) indebt themselves just to have a phone or laptop!

Should we aim for more "#FrugslComputing"?

Abdolutely!

Is it realistic to expect things to be in a perfectly accessible TUI that ebery screenreader can handle?

That being said the apathy of consumers is real, and very frustrating:

People get nudged into accepting all the bs and it really pisses me off because they want me to look like ab outsider / asshole for not submitting to #consumerism and #unsustainable shite...

ぷにすきーENIGMATICO :flag_bisexual: :flag_nonbinary: (@enigmatico)I get this is a joke, but here is the thing (aside of the joke). People doesnt use crappy laptops anymore. People moves on to phones/tablets, or if they want something more serious, something like a gamer PC. Most people will buy a console if they want to play games though. In that context, nobody cares anymore about bloat. If you are a developer its easier for you to use some bloaty framework that gets the job done in a couple days, because at the end of the day, if you're going to be exploited and crunched to death, you might as well make it as short as possible. And as a consumer, nobody really cares. You buy whatever allows you to do what you wwant and thats it. Or whatever your pocket allows you. And to be completely honest with you all, this has always been like this. You have to do with what you have. Could the world be better if everyone used pure C and assembly? Maybe... if companies had the intention to spend years developing ttheir products and fixing critical bugs before launch. By the time of the launch they would be obsolete. Kinda what happen to Duke Nukem Forever. RN: (📎1)

**Mimetik** @Mimetik@genart.social · Dec 7, 2024

Dec 7, 2024

Mimetik @Mimetik@genart.social

Still work in progress: debugging a reaction-diffusion compute shader for a GPU generated mesh.

#openrndr #cellforms #reactiondiffusion

Continued thread

**Oblomov** @oblomov@sociale.network · Nov 29, 2024

Nov 29, 2024

Oblomov @oblomov@sociale.network

Even better, in the afternoon I managed to find a workaround for my #GPGPU software building but hanging when trying to run it, which seems to be related to an issue with some versions of the #AMD software stack and many integrated GPUs, not just the #SteamDeck specifically. So exporting the HSA_ENABLE_SDMA=0 environment vriable was sufficient to get my software running again. I'm dropping the information here in case others find it useful.

#ROCm #GPU #APU #HIP

2/2

Continued thread

**Giuseppe Bilotta** @giuseppebilotta@fediscience.org · Oct 29, 2024 *

Oct 29, 2024 *

Giuseppe Bilotta @giuseppebilotta@fediscience.org

It's out, if anyone is curious

https://doi.org/10.1002/cpe.8313

This is a “how to” guide. #GPUSPH, as the name suggests, was designed from the ground up to run on #GPU (w/ #CUDA, for historical reasons). We wrote a CPU version a long time ago for a publication that required a comparison, but it was never maintained. In 2021, I finally took the plunge, and taking inspiration from #SYCL, adapted the device code in functor form, so that it could be “trivially” compiled for CPU as well.

#HPC #GPGPU

**Giuseppe Bilotta** @giuseppebilotta@fediscience.org · Oct 23, 2024

Oct 23, 2024

Giuseppe Bilotta @giuseppebilotta@fediscience.org

I love the smell of burning plastic when running a heavy-duty #CFD simulation on my laptop's #GPU

#humor #humour #HPC

**Anthony Cowley** @acowley@mastodon.social · Aug 19, 2024

Aug 19, 2024

Anthony Cowley @acowley@mastodon.social

Here’s hoping that the transition of #Rust #GPU to community ownership goes well! The intention to focus on #GPGPU is more than welcome, as I feel the development of some GPU programming ecosystems has been held back by a too-narrow focus on traditional GPU graphics techniques. The dream is to be able to write CUDA-like Rust that can target hardware from multiple vendors!

https://rust-gpu.github.io/blog/transition-announcement/

rust-gpu.github.ioRust GPU Transitions to Community Ownership |

**Pyrzout** @jos1264@social.skynetcloud.site · Jul 16, 2024

Jul 16, 2024

Pyrzout @jos1264@social.skynetcloud.site

CUDA, But Make It AMD https://hackaday.com/2024/07/16/cuda-but-make-it-amd/ #generalpurposegpu #MachineLearning #MiscHacks #radeon #gpgpu #CUDA #amd #ATI

Hackaday · Jul 16, 2024CUDA, But Make It AMDCompute Unified Device Architecture, or CUDA, is a software platform for doing big parallel calculation tasks on NVIDIA GPUs. It’s been a big part of the push to use GPUs for general purpose …

**GeekProjects News** @news@geekprojects.com · Jul 16, 2024

Jul 16, 2024

GeekProjects News @news@geekprojects.com

CUDA, But Make It AMD https://hackaday.com/2024/07/16/cuda-but-make-it-amd/ #generalpurposegpu #MachineLearning #MiscHacks #radeon #gpgpu #CUDA #amd #ATI

**IT News** @itnewsbot@schleuss.online · Jul 16, 2024

Jul 16, 2024

IT News @itnewsbot@schleuss.online

CUDA, But Make It AMD - Compute Unified Device Architecture, or CUDA, is a software platform for doing big... - https://hackaday.com/2024/07/16/cuda-but-make-it-amd/ #generalpurposegpu #machinelearning #mischacks #radeon #gpgpu #cuda #amd #ati

**Karsten Schmidt** @toxi@mastodon.thi.ng · Jul 11, 2024

Jul 11, 2024

Karsten Schmidt @toxi@mastodon.thi.ng

Uploaded a new demo/example showing how to perform GPU-side data reductions using https://thi.ng/shader-ast & https://thi.ng/webgl multi-pass pipeline. Arbitrary reduction functions supported. If there's interest, this could be expanded & packaged up as library... 90% of this example is boiler plate, 9.9% benchmarking & debug outputs...

Demo:
https://demo.thi.ng/umbrella/gpgpu-reduce/

Source code:
https://github.com/thi-ng/umbrella/blob/develop/examples/gpgpu-reduce/src/index.ts

Readme w/ benchmark results:
https://github.com/thi-ng/umbrella/tree/develop/examples/gpgpu-reduce

Related discussion:
https://github.com/thi-ng/umbrella/issues/478

#ThingUmbrella #WebGL #ShaderAST

Recent searches

Search options

Administered by:

Server stats:

#gpgpu