techhub.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A hub primarily for passionate technologists, but everyone is welcome

Administered by:

Server stats:

4.8K
active users

🧠 A major AI shift may have arrived. Epoch AI last year warned of a data shortage for LLMs by 2026‑32, suggesting learning from real-world interactions.

Now, we see that future:

✨🔀 Experience Streams 🔀✨

Introducing The Era of Experience by Silver & Sutton
👉 tinyurl.com/wn2938xt.

No more internet scraping or hand-fed reward models.

Agents now act, observe, and adapt in real-life using reinforcement learning with real rewards—learning from consequences, not just datasets.

2022, nach ChatGPT: Libraries überall, die "LLM-powered"-Apps einfacher machen sollten. Ich dachte:

"Brauchen wir 'ne Library nur für Prompts & API-Wrapping? 🤔"

Jetzt, mit KI-Agent-Hype:

"Brauchen wir 'ne Library, um LLMs zu orchestrieren? 🤷"

Déjà-vu oder nur ich? 🤨🚤

Back in 2022, after ChatGPT launched, I wondered:
"Do we really need libraries just to manage prompts & wrap an LLM API? 🤔"

Now, with KI-Agent hype, I’m asking:
"Do we really need libraries to orchestrate LLMs talking to each other? 🤷"

Déjà vu, or just me? 🤨🚤

What do WebArena, AssistantBench, and GAIA all agree on? Agentic AI has a long way to go.

Across these benchmarks, even the best-performing models barely crack 60%. Hard tasks lay bare just how far we are from true autonomy.

Yet marketing departments spin tales of “breakthroughs,” and startups rake in jaw-dropping valuations on that hype.

Maybe it’s time we let the leaderboards do the talking. 🧐

[1] gaia-benchmark-leaderboard.hf.
[2] huggingface.co/spaces/Assistan
[3] shorturl.at/8lLMJ

gaia-benchmark-leaderboard.hf.spaceGradio

Was haben WebArena, AssistantBench und GAIA gemeinsam? KI-Agenten sind noch weit von echter Autonomie entfernt.

Selbst die besten Modelle schaffen kaum mehr als 60 % – und bei schwierigen Aufgaben zeigt sich, wie schnell ihre Grenzen erreicht sind.

Wie seht ihr das? Sind wir bei KI-Agenten auf dem richtigen Weg, oder überholt der Hype die Realität? Lasst es mich in den Kommentaren wissen!

[1] gaia-benchmark-leaderboard.hf.
[2] huggingface.co/spaces/Assistan
[3] shorturl.at/8lLMJ

gaia-benchmark-leaderboard.hf.spaceGradio

Agentic AI: Revolutionary or Risky?

HBR says it offers:
💼 Specialization
🎨 Innovation
🤨 Trustworthiness

My take:

1️⃣ Specialization => Fragility (Think Boeing MCAS).
2️⃣ Innovation => Homogeneity (Same ideas, no breakthroughs).
3️⃣ Trustworthiness => Bias (Still mirrors human flaws).
4️⃣ Productivity => Job losses (Big unemployment shifts ahead).

Agentic AI isn’t all bad, but stay critical. Build smart, ask tough questions, and don’t buy the hype.

HMU if you want to chat!

Continued thread

2025 = Year of the KG 🧠🔥

Agentic AI isn’t just about agents—it’s about how knowledge is stored and retrieved.

Will the KG hype train charge ahead 🚄 or get derailed by a wild curveball?

Let’s see how this plays out. 😏

Continued thread

2025 = Year of the KG 🧠🔥

Agentic AI isn’t just about agents—it’s about how knowledge is stored and retrieved.

Will the KG hype train charge ahead 🚄 or get derailed by a wild curveball?

Let’s see how this plays out. 😏

Continued thread

Q4: Incumbents Play Catch-Up 🐢➡️🚀
Big DB vendors bolt on KG features (like they did w/ vector search in 2023).

They rebrand: “AI KG-native databases for the Agentic Era” or whatever LinkedIn buzzword’s trending.

KG storage = table stakes by year-end.

Continued thread

Q3: RAG + Knowledge Graphs = 🦄
Summer brings a breakthrough: LLMs + Knowledge Graphs (KGs) = game changer.

Startups grinding on KG solutions suddenly become the hottest ticket in town. 💼🔥

VC’s FOMO sends valuations 🚀. A new generation of unicorns is born.

Continued thread

Q2: "Wait, Our Knowledge Base Sucks?" 🤯
By mid-year, companies realize: AI agents are only as good as their knowledge base.

Dense vector retrieval + chunked docs? 🛑 Not enough.

Teams scramble for smarter retrieval: hybrid search, multi-modal, you name it.

Continued thread

Q1: POCs Gone Wild 🤡
Every enterprise is throwing massive budgets at Agentic AI POCs.

Teams are hyped. CEOs are hyped. VCs? Oh, they’re really hyped.

It’s all flashy demos and big promises… until reality kicks in.

🚨 Everyone’s hyped about Agentic AI being the next big thing in 2025.

But will it deliver? Here's my play-by-play prediction for how the year will unfold—quarter by quarter. 🧵👇

Ever wonder what it takes to scale a startup without losing your sanity? 🤪🚀 HubSpot’s co-founder Brian Halligan built a $25B company & shared no-BS lessons. My top takeaways:

* EV > TV > MeV: Maximize impact for the company
* Mercenaries vs. Missionaries
* Culture breaks at 150 (Dunbar’s number)
* Hire a few years ahead

Check out the full article—it’s loaded with insights on leadership & crisis management!

sequoiacap.com/article/a-start

Sequoia Capital · A Startup Founder To Scaleup CEO’s Journey from $0 to $25billion (Halliganisms)By Brian Halligan

Post-RAG demo, everyone asks: 'Best way to chunk a text document?' 📄 With multimodal models rising, next up: 'How to chunk a video?' 🎥

Why not scan every few seconds with CLIP? 🤔 Queries like 'Billboards of X in the background?' matter for product placements but fail in busy scenes 🎬.

Object detection and captioning extend indexing times ⏳. So, what's the optimal video chunking method for RAG? Or, are there compelling RAG video applications that justify the effort? 🧐

Always doubted the efficacy of GPUs for vector databases due to memory constraints 😅... then I found BANG. This method smartly uses the CPU for index storage (DiskANN style) and unleashes GPU for queries, with optimizations that cleverly manage CPU-GPU bandwidth. 🛠️

🚀 Results: BANG achieves 40x to 200x higher throughput on billion-scale datasets, surpassing GGNN and FAISS.

Could this be a game-changer for vector databases on GPUs, especially at large scales? 💰💰

arxiv.org/abs/2401.11324

arXiv.orgBANG: Billion-Scale Approximate Nearest Neighbor Search using a Single GPUApproximate Nearest Neighbour Search (ANNS) is a subroutine in algorithms routinely employed in information retrieval, pattern recognition, data mining, image processing, and beyond. Recent works have established that graph-based ANNS algorithms are practically more efficient than the other methods proposed in the literature, on large datasets. The growing volume and dimensionality of data necessitates designing scalable techniques for ANNS. To this end, the prior art has explored parallelizing graph-based ANNS on GPU leveraging its high computational power and energy efficiency. The current state-of-the-art GPU-based ANNS algorithms either (i) require both the index-graph and the data to reside entirely in the GPU memory, or (ii) they partition the data into small independent shards, each of which can fit in GPU memory, and perform the search on these shards on the GPU. While the first approach fails to handle large datasets due to the limited memory available on the GPU, the latter delivers poor performance on large datasets due to high data traffic over the low-bandwidth PCIe bus. In this paper, we introduce BANG, a first-of-its-kind GPU-based ANNS method which works efficiently on billion-scale datasets that cannot entirely fit in the GPU memory. BANG stands out by harnessing compressed data on the GPU to perform distance computations while maintaining the graph on the CPU. BANG incorporates high-optimized GPU kernels and proceeds in stages that run concurrently on the GPU and CPU, taking advantage of their architectural specificities. We evaluate BANG using a single NVIDIA Ampere A100 GPU on ten popular ANN benchmark datasets. BANG outperforms the state-of-the-art in the majority of the cases. Notably, on the billion-size datasets, we are significantly faster than our competitors, achieving throughputs 40x-200x more than the competing methods for a high recall of 0.9.

Impressed by the VIVID system detailed in recent research 📚, which transforms lectures into dynamic dialogues using AI 🤖. This tool enhances educational quality by allowing educators to focus more on teaching dynamics rather than content generation ✍️.

The VIVID system's principles could also lead to personalized interactive tutors for students 🎓. Here's to AI unlocking new educational opportunities globally 🌍!

arxiv.org/abs/2403.09168

arXiv.orgVIVID: Human-AI Collaborative Authoring of Vicarious Dialogues from Lecture VideosThe lengthy monologue-style online lectures cause learners to lose engagement easily. Designing lectures in a "vicarious dialogue" format can foster learners' cognitive activities more than monologue-style. However, designing online lectures in a dialogue style catered to the diverse needs of learners is laborious for instructors. We conducted a design workshop with eight educational experts and seven instructors to present key guidelines and the potential use of large language models (LLM) to transform a monologue lecture script into pedagogically meaningful dialogue. Applying these design guidelines, we created VIVID which allows instructors to collaborate with LLMs to design, evaluate, and modify pedagogical dialogues. In a within-subjects study with instructors (N=12), we show that VIVID helped instructors select and revise dialogues efficiently, thereby supporting the authoring of quality dialogues. Our findings demonstrate the potential of LLMs to assist instructors with creating high-quality educational dialogues across various learning stages.

Building a RAG system? It’s tricky! Here's what can go wrong:

1. Missing Info: Tries to answer without enough data.

2. Overlooked Docs: Important docs don't rank high enough.

3. Context Issues: Correct docs aren't used right.

4. Missing Answers: Right answers overlooked.

5. Formatting Fails: Answers in the wrong format.

6. Detail Mismatch: Answers too vague or detailed.

7. Incomplete Answers: Answers miss available details.

source: arxiv.org/pdf/2401.05856.pdf

If you're considering creating your own on-prem RAG solution, this paper is a must-read!

It provides insight into developing a RAG system for handling complex inquiries within a non-profit organization.

Key topics include data collection process, model evaluation, and implementation challenges.

Find the full paper here:

arxiv.org/abs/2402.07483

arXiv.orgT-RAG: Lessons from the LLM TrenchesLarge Language Models (LLM) have shown remarkable language capabilities fueling attempts to integrate them into applications across a wide range of domains. An important application area is question answering over private enterprise documents where the main considerations are data security, which necessitates applications that can be deployed on-prem, limited computational resources and the need for a robust application that correctly responds to queries. Retrieval-Augmented Generation (RAG) has emerged as the most prominent framework for building LLM-based applications. While building a RAG is relatively straightforward, making it robust and a reliable application requires extensive customization and relatively deep knowledge of the application domain. We share our experiences building and deploying an LLM application for question answering over private organizational documents. Our application combines the use of RAG with a finetuned open-source LLM. Additionally, our system, which we call Tree-RAG (T-RAG), uses a tree structure to represent entity hierarchies within the organization. This is used to generate a textual description to augment the context when responding to user queries pertaining to entities within the organization's hierarchy. Our evaluations, including a Needle in a Haystack test, show that this combination performs better than a simple RAG or finetuning implementation. Finally, we share some lessons learned based on our experiences building an LLM application for real-world use.
#rag#vectordb#genai