techhub.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A hub primarily for passionate technologists, but everyone is welcome

Administered by:

Server stats:

5.2K
active users

#scraping

6 posts2 participants1 post today
Continued thread

1. Vereinbarung geschlossener KI-Systeme: die eigenen Daten bleiben hier in einem "Silo".
2. #Anonymisierung der eigenen Daten: aufwändig und meistens letztlich nicht wirksam genug (verbleibende Rest-Informationen können de-anonymisiert werden).
3. Schutz der eigenen Daten vor KI-#Scraping durch #Wasserzeichen und Widerspruchs-Hinweise – leider nur sehr eingeschränkt wirksam.

Die Frage, die sich jede Organisation stellen sollte: Wie schütze ich meine IP vor "datenhungrigen" KI-Anbietern?

KI-Verordnung: verbotene Praktiken und die DSGVO

Die Umsetzung der KI-Verordnung wird von der Europäischen Kommission vorangetrieben. Ein Schwerpunkt liegt dabei auf den Praktiken, die nach der KI-Verordnung verboten sind. Zum Teil bietet auch die DSGVO bereits Schutz vor solchen Praktiken. Der Beitrag geht hierauf anhand eines Beispiels genauer e(...)
dr-datenschutz.de/ki-verordnun

Dr. DatenschutzKI-Verordnung: verbotene Praktiken und die DSGVO
More from Dr. Datenschutz

Managing AI Bots+ w/ Apache MPM, FPM, & Fail2Ban: tech.haacksnetworking.org/2025 There's been a lot of continued discussion on this topic, so I decided to investigate some of the common reports, compare those to my own hardware, theoretical ceilings and caps, and then adjusted my LAMP stack and fail2ban as per this blog entry. Let me know what yall think or if you find any errors or questionable claims. -oemb1905 #ai #scraping #apache #opensource #freesoftware #floss #ddos #php

The Markup: A Guide on How to Legally Web Scrape EU Data. “At The Markup, some of our data journalists recently had questions about the legal risks involved in scraping websites hosted in the European Union. We conducted our own research to answer this question, and offer a summary of what we learned below. Our goal is to help other journalists, researchers, and advocates come up with a […]

https://rbfirehose.com/2025/04/06/the-markup-a-guide-on-how-to-legally-web-scrape-eu-data/

**#Scraping of #LLM ’s explained:**
* take a company that lives off of answering people’s questions, e.g. WikiHow
* take all WikiHow’s guides and turn them into an answerbot.
* make money by providing the answers to WikiHow users with your chatbot
* claim that you are not a competitor to WikiHow, and your use of their entire content library is #FairUse
* repeat with the entire internet

I've set up my new #inkscape website AI bot trap. It works by giving everyone a chance to not fall into it.

An anchor link that says "I am a bot" and links to /P3W-451/{datetime}/ it's got a fixed position at top -100px so should never be seen

The robots.txt says "Disallow: /P3W-451/" so if you were reading the robots, you'd know.

Then #nginx logs the requests to a log of their ip-addresses and browser strings and sends them a 301 redirect to google.com

#ai #Scraping

1/2