techhub.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A hub primarily for passionate technologists, but everyone is welcome

Administered by:

Server stats:

5.4K
active users

#foundation

8 posts7 participants0 posts today

Oh joy, yet another paper with a title that sounds like a new #crypto #fad 🤦‍♂️. Thank goodness for the #Simons Foundation's donations, because without them, we'd clearly be lost in a sea of #academic gobbledygook 🤷‍♀️. Can't wait to "multi-token" my way to #enlightenment - or was it ennui? 📚🔍
arxiv.org/abs/2504.00927 #Foundation #research #multi-token #HackerNews #ngated

arXiv logo
arXiv.orgMulti-Token AttentionSoft attention is a critical mechanism powering LLMs to locate relevant parts within a given context. However, individual attention weights are determined by the similarity of only a single query and key token vector. This "single token attention" bottlenecks the amount of information used in distinguishing a relevant part from the rest of the context. To address this issue, we propose a new attention method, Multi-Token Attention (MTA), which allows LLMs to condition their attention weights on multiple query and key vectors simultaneously. This is achieved by applying convolution operations over queries, keys and heads, allowing nearby queries and keys to affect each other's attention weights for more precise attention. As a result, our method can locate relevant context using richer, more nuanced information that can exceed a single vector's capacity. Through extensive evaluations, we demonstrate that MTA achieves enhanced performance on a range of popular benchmarks. Notably, it outperforms Transformer baseline models on standard language modeling tasks, and on tasks that require searching for information within long contexts, where our method's ability to leverage richer information proves particularly beneficial.

DST change reminded me of that time QA discovered that, if you build a date from NSDateComponents, any missing items will not be set to zero, but will be set to the corresponding value from the current date and time.

Which meant that my March 30th, 12:00 would get random one-minute-drift like 12:00:59 because the user entered the date right before a minute change.

Makes sense for the date, but was a bit bewildering for the time, especially seconds.

Eure IT-Spenden und @Labdoo_D wirken und machen die Welt jeden Tag etwas besser - diese Bilder zeigen es ganz deutlich 😀
Kinder in der Region Arusha, Tansania erhalten dort von der Dunia Salama Foundation an jedem Wochenende einen freien Computerkurs.
platform.labdoo.org/de/edoovil
#labdoo #salama #foundation #arusha #tansania #tanzania #computer #course #children #kinder #teilhabe #reise #travel #bildung #education