Web scraping multiple pages allows us to collect large amounts of data spread across paginated web content. In Python, this is done by sending repeated requests, handling page links and extracting required information in a structured way. In this article, weâll take the GeeksforGeeks website as an example and write a Python script to extract the titles of all articles available on its homepage.

Scraping Multiple Pages of a Website Using Python
When we need to collect data from several pages of the same website or from different URLs, writing separate code for each page can be slow and repetitive To make this process easier, weâll learn two simple techniques to scrape data from multiple webpages:
- From multiple pages of the same website
- From different website URLs
Approach:
- Import the necessary libraries.
- Set up the base URL and connect using the requests library.
- Parse the webpage data using BeautifulSoup.
- Locate and extract the HTML tags or classes containing the required information.
- Test it on one page, then use a loop to scrape multiple pages automatically.
Example 1: Looping through the page numbersÂ

Most websites organize their content across multiple pages labeled from 1 to N, making it easy to loop through them since their structure usually remains the same.

For example, if the URL of a page ends with something like page/4/, we can change that number dynamically in our code. By using a simple for loop and replacing the page number (i) in the URL, we can automatically visit each page and extract the required data without manually editing the URL each time.
The following example demonstrates how to scrape data from multiple pages using a for loop in Python.
import requests
from bs4 import BeautifulSoup as bs
URL = 'https://www.geeksforgeeks.org//page/1/'
req = requests.get(URL)
soup = bs(req.text, 'html.parser')
titles = soup.find_all('div',attrs = {'class','head'})
print(titles[4].text)
Output:
Now, using the above code, we can get the titles of all the articles by just sandwiching those lines with a loop.
import requests
from bs4 import BeautifulSoup as bs
URL = 'https://www.geeksforgeeks.org//page/1/'
for page in range(1,10):
req = requests.get(URL + str(page) + '/')
soup = bs(req.text, 'html.parser')
titles = soup.find_all('div',attrs={'class': 'head'})
for i in range(4,19):
if page>1:
print(f"{(i-3)+page*15}" + titles[i].text)
else:
print(f"{i-3}" + titles[i].text)
Output

Note: The above code will fetch the first 10 pages from the website and scrape all the 150 titles of the articles that fall under those pages.Â
Example 2: Looping through a list of different URLs.
The previous method worked well when pages followed a numbered pattern. But sometimes, we may want to scrape data from pages that donât have page numbers or follow different URL structures.
In such cases, instead of writing separate code for each page, we can simply store all the URLs in a list and loop through them to extract data easily. Hereâs an example:
import requests
from bs4 import BeautifulSoup as bs
URL = ['https://www.geeksforgeeks.org/','https://www.geeksforgeeks.org//page/10/']
for url in range(0,2):
req = requests.get(URL[url])
soup = bs(req.text, 'html.parser')
titles = soup.find_all('div',attrs={'class','head'})
for i in range(4, 19):
if url+1 > 1:
print(f"{(i - 3) + url * 15}" + titles[i].text)
else:
print(f"{i - 3}" + titles[i].text)
Output

How to avoid getting our IP address banned
When scraping multiple pages, sending too many requests in a short time can overload the websiteâs server. This might lead to our IP address getting blocked or blacklisted.
To prevent this, itâs important to control our crawl rate, that is how frequently our program sends requests. The best way to do this is to add short, random pauses between requests making our script behave more like a human browsing naturally.
We can achieve this using two Python functions:
- randint() from the random module: generates a random number between two limits.
- sleep() from the time module: pauses the program for a few seconds.
Example:
from time import *
from random import randint
for i in range(0,3):
# selects random integer in given range
x = randint(2,5)
print(x)
sleep(x)
print(f'I waited {x} seconds')
Output
5 I waited 5 seconds 4 I waited 4 seconds 5 I waited 5 seconds
Now, letâs apply this logic to our web scraping loop:
import requests
from bs4 import BeautifulSoup as bs
from random import randint
from time import sleep
URL = 'https://www.geeksforgeeks.org//page/1/'
for page in range(1,10):
# pls note that the total number of pages in the website is more than 5000 so i'm only taking the first 10 as this is just an example
req = requests.get(URL + str(page) + '/')
soup = bs(req.text, 'html.parser')
titles = soup.find_all('div',attrs={'class','head'})
for i in range(4,19):
if page>1:
print(f"{(i-3)+page*15}" + titles[i].text)
else:
print(f"{i-3}" + titles[i].text)
sleep(randint(2,10))
Output