web-scraping
Projects with this topic
-
Bug fixes for the abandoned python Wikipedia project to warn the user when the Wikpedia suggestion engine is corrupting the titles of valid Wikipedia articles. Required for the examples in Natural Language Processing in Action, 2nd Edition by Maria Dyshel and Hobson Lane (and a community of more than 30 contributing authors and editors).
Updated -
This repo is a mix of several data science tools. There is a mix of web-scraping of data that is then cleaned, and used to analyze the property market in malta, using prediction models, visualisations and statistical analysis.
There is also visualisations for chess data from the 1980's till 2021. Moreover, there is twitter data, which is then stored in the neo4J nosql dbms.
No data is presented in the git, only the results. Code with the data can be found at: https://drive.google.com/file/d/15EQnRtsngDsFDD_A7g4N1fwCuXI0f_Xi/view?usp=sharing
UpdatedUpdated -
Este proyecto esta realizado para obtener el precio del dolar en venezuela, escaneando la pagina del Banco Central de Venezuela http://www.bcv.org.ve/
UpdatedUpdated -
Alguns exemplos simples de Web Scrapping com Python, para retornar no console dados desejados e salvá-los em arquivos CSV.
Updated -
Three parts for advanced scraping. Part 1, being able to reach search pages, when only some results are returned and you need to cycle to get all of them. Part2, using Django to handle database reads and saves via APi's, Part 3. Scraping pages with a lot of data, validating via Pydantic and saving via the Django APi
Updated -
NFinance is a web scraper for finance news from Yahoo Finance and WSJ, useful for any trader or investor.
Updated -
Extracts Slow Food Switzerland locations from the official website, and places them into KML format.
Updated -
Image scraper for Norwegian Institute for Nature Research (NINA)s wildlife camera database at https://viltkamera.nina.no/
Their image license prohibits distribution of the images to third parties, as well as commercial use, and publishing. I believe downloading them yourself for research use should be okay, but I'm not a lawyer.
Updated -
This tool was designed to extract member institution locations from the official North American Reciprocal Association (NARM) website and places them into KML format. There are a few Google maps versions of this data created by individuals out there (like this one) that are probably mostly accurate, but the reciprocal agreements evidently update at least annually, which means they have a higher potential for inaccuracy every year they are not updated. Hopefully that dynamism within the NARM locations makes this script a particularly useful one, as (in theory) until NARM makes a major update to their website source code, this scraper will always produce a kml file with the most updated map at any time of any year.
Updated -
Scrape instagram followers for a given, find out who has been followed or unfollowed ;)
Updated -
-
Web scraping of real estates with Scrapy, pydantic, FastAPI, MongoDB and MinIO
Updated -
Web application and cloud pipeline to analyze the status of the Data Science job market
Updated -
Python wrapper library for Prompt API's Scraper API
Updated