Want to analyze text from the EU public consultations? EU public consultations are a way in which the EU invites the broader public to publicly comment on upcoming legislation.
I just published a first version of a Python package {eu-consultations} to scrape and extract text from the EU website:
https://github.com/marioangst/eu_consultations
- download consultation data as displayed on the EU's frontend into a validated form
- download associated files (this is the hard part about analysing this data - lots of feedback is in .docx and .pdf files)
- extract text from the files using docling and attach to feedback
You get all data in validated form and possibly stored in huge (sorry for that) JSON files ;).
This package is part of an analysis project on feedback the EU has received via the public consultation process on digital policy we plan to present later this year, but I thought let's make some of the tools we use open source way earlier already.