The Toughdata Blog

Technical and practical perspectives on web scraping and AI.

How to Type Hint SQLAlchemy Join Queries in Python

Properly type hinting join queries is not well documented or even spoken of. This article seeks to dispel some confusion around the topic.


Feb 21, 2024

Finetuning Flan-T5 for Question Answering using Quora data

Flan-T5 is a powerful text-to-text model that excels in summarization, translation, and question answering. Today we will be fine-tuning it on over 56,000 question answer pairs scraped from Quora.


Aug 24, 2023

Scrape Instagram data at scale with Lamadava and Datalama

When it comes to scraping instagram... How fast is fast? How cheap is cheap? How much data do you need? Lamadava and Datalama may have the answers you need.


Jul 19, 2023

How to train RWKV on a new dataset using HuggingFace Transformers

RWKV is the latest advancement in language modeling. It is extremely fast, powerful, and efficient. This post shows you that it can also be simple to train it on a new corpus using the HuggingFace Transformers library.


Jul 17, 2023

Scraping TikTok and Instagram with Solid Datacenter Proxies

If you are a small business who needs to scrape TikTok, Instagram, or other social media, you have probably heard the mantra that you need residential proxies. If you are unable to pay the premium for residential proxies, I have good news for you. Depending on your use-case, a solid datacenter proxy can get the job done. Do your research and figure out what works best before shelling out your cash.


Jul 11, 2023

Generate leads on TikTok by scraping an influencer's follower data

Lead generation is a classic use case for web scraping. In this post I will describe how to scrape a TikTok influencer's followers for like-minded individuals to reach out to.


Jul 02, 2023

How to Scrape TikTok Profile Data with Asynchronous Python

Social media profile data is immensely valuable in online marketing and lead generation. Web scraping is the best way to get this data, but some methods are faster than others. This post shows you how to scrape TikTok profiles with aiohttp, an asynchronous Python HTTP library.


Jul 01, 2023

Scraping 10,000 records at once with aiohttp

Web scraping with Python's requests module can get pretty slow when you need to collect a huge amount of data or have latency issues. This post shows how to use aiohttp and asyncio to make thousands of requests at once to scrape faster and scale your data collection.


Jun 15, 2023