techhub.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A hub primarily for passionate technologists, but everyone is welcome

Administered by:

Server stats:

4.6K
active users

#webscraping

1 post1 participant0 posts today

New Open-Source Tool Spotlight 🚨🚨🚨

Scrapling is redefining Python web scraping. Adaptive, stealthy, and fast, it can bypass anti-bot measures while auto-tracking changes in website structure. A standout: 4.5x faster than AutoScraper for text-based extractions. #Python #WebScraping

🔗 Project link on #GitHub 👉 github.com/D4Vinci/Scrapling

#Infosec #Cybersecurity #Software #Technology #News #CTF #Cybersecuritycareer #hacking #redteam #blueteam #purpleteam #tips #opensource #cloudsecurity

✨
🔐 P.S. Found this helpful? Tap Follow for more cybersecurity tips and insights! I share weekly content for professionals and people who want to get into cyber. Happy hacking 💻🏴‍☠️

Q: Based on his ideas, would Adolf Hitler be for or against GDPR and right to erasure nowadays if he still lived?

A: It's reasonable to infer that Hitler would not support a regulation like #GDPR which emphasizes individual rights such as #privacy protection, data accessibility or erasure; and instead might favor more centralized control over information dissemination for propaganda purposes.

"The report, titled “Are AI Bots Knocking Cultural Heritage Offline?” was written by Weinberg of the GLAM-E Lab, a joint initiative between the Centre for Science, Culture and the Law at the University of Exeter and the Engelberg Center on Innovation Law & Policy at NYU Law, which works with smaller cultural institutions and community organizations to build open access capacity and expertise. GLAM is an acronym for galleries, libraries, archives, and museums. The report is based on a survey of 43 institutions with open online resources and collections in Europe, North America, and Oceania. Respondents also shared data and analytics, and some followed up with individual interviews. The data is anonymized so institutions could share information more freely, and to prevent AI bot operators from undermining their countermeasures.

Of the 43 respondents, 39 said they had experienced a recent increase in traffic. Twenty-seven of those 39 attributed the increase in traffic to AI training data bots, with an additional seven saying the AI bots could be contributing to the increase.

“Multiple respondents compared the behavior of the swarming bots to more traditional online behavior such as Distributed Denial of Service (DDoS) attacks designed to maliciously drive unsustainable levels of traffic to a server, effectively taking it offline,” the report said. “Like a DDoS incident, the swarms quickly overwhelm the collections, knocking servers offline and forcing administrators to scramble to implement countermeasures. As one respondent noted, ‘If they wanted us dead, we’d be dead.’”"

404media.co/ai-scraping-bots-a

404 Media · AI Scraping Bots Are Breaking Open Libraries, Archives, and Museums"This is a moment where that community feels collectively under threat and isn't sure what the process is for solving the problem.”

Scraping isn’t just about data collection.

It’s about precision:
✔️ Accurate values
✔️ Consistent formats
✔️ Real-time reliability

General-purpose AI often falls short.

That’s why more teams trust PromptCloud for scalable, structured web data.

📖 Read the full breakdown: shorturl.at/1oTaR

PromptCloud · Why Perplexity AI Falls Short for Web Scraping TasksPerplexity AI may be smart, but it’s not built for web scraping. Learn why businesses need more control, accuracy, and compliance.

Are AI bots overwhelming digital collections?
A new GLAM-E Lab report shows how scrapers for AI training datasets are putting real strain on the infrastructures of galleries, libraries, archives, and museums. Technical bottlenecks, ethical dilemmas, and escalating costs—open culture is under pressure.
Read the full analysis:
glamelab.org/products/are-ai-b
#DigitalHeritage #GLAM #WebScraping #OpenAccess #CulturalData #MuseTech #DigitalHumanities #GLAMlab

GLAM-E LabAre AI Bots Knocking Cultural Heritage Offline?