Horizontal Scaling in mongodb
Created on: Sep 1, 2024
We can do horizontal scaling of mongondb using sharding. Sharding is a method for distributing data across multiple machines.
There are few terms associated with sharding.
- Shard: A shard is a replica set that contains a subset of the cluster's data. Once sharding is enabled and MongoDB instance is unable to handle the large dataset, MongoDB splits the desired collections into multiple shards to achieve horizontal scaling.Each shard has a primary and one or more secondary shards. The primary shard is responsible for the writes and replicates the same to the secondary shards.
- mongos: The mongos instance acts as a query router for client applications, handling both read and write operations. It dispatches client requests to the relevant shards and aggregates the result from shards into a consistent client response. Clients connect to a mongos, not to individual shards.
- config servers: A config server replica set consists of multiple MongoDB replica set members. They are the authoritative source of sharding metadata. The sharding metadata reflects the state and organization of the sharded data. The metadata contains the list of sharded collections, routing information, etc.
Let's do the practice of clustering.
Docker network
docker network create mongo-net
Config Server
As per documentation we should have multiple instance of config server. We will create 2 config server first.
docker run -d --net mongo-net --name config-server-1 -p 27101:27017 mongo:7.0.14 mongod --port 27017 --configsvr --replSet rs_config docker run -d --net mongo-net --name config-server-2 -p 27102:27017 mongo:7.0.14 mongod --port 27017 --configsvr --replSet rs_config # enter in one of config server docker exec -it config-server-1 mongosh # Initialize Config Server Replica Set rs.initiate( { _id: "rs_config", version: 1, members: [ { _id: 0, host : "config-server-2:27017" }, { _id: 1, host : "config-server-1:27017" } ] } ) # check status of replica set rs.status()
Shard
Let's create two shard with three replica set each.
docker run -d --net mongo-net --name shard-1-Node-a -p 27111:27017 mongo:7.0.14 mongod --port 27017 --shardsvr --replSet rs-shard-1 docker run -d --net mongo-net --name shard-1-Node-b -p 27121:27017 mongo:7.0.14 mongod --port 27017 --shardsvr --replSet rs-shard-1 docker exec -it shard-1-Node-a mongosh rs.initiate( { _id: "rs-shard-1", version: 1, members: [ { _id: 0, host : "shard-1-Node-a:27017" }, { _id: 1, host : "shard-1-Node-b:27017" } ] } ) # check status of replica set rs.status()
Similarly create another shard
docker run -d --net mongo-net --name shard-2-Node-a -p 27112:27017 mongo:7.0.14 mongod --port 27017 --shardsvr --replSet rs-shard-2 docker run -d --net mongo-net --name shard-2-Node-b -p 27122:27017 mongo:7.0.14 mongod --port 27017 --shardsvr --replSet rs-shard-2 docker exec -it shard-2-Node-a mongosh rs.initiate( { _id: "rs-shard-2", version: 1, members: [ { _id: 0, host : "shard-2-Node-a:27017" }, { _id: 1, host : "shard-2-Node-b:27017" } ] } )
mongos and initialize shards
Let's create single instance of router for demo.
docker run -d --net mongo-net --name router-1 -p 27141:27017 mongo:7.0.14 mongos --port 27017 --configdb rs_config/config-server-1:27017,config-server-2:27017 --bind_ip_all sh.addShard("rs-shard-1/shard-1-Node-a:27017", "rs-shard-1/shard-1-Node-b:27017"); sh.addShard("rs-shard-2/shard-2-Node-a:27017", "rs-shard-2/shard-2-Node-b:27017"); # check status sh.status()
You will have seven container as below.
Let's create a sharded db and collection. Connect to mongos node and run below command.
docker exec -it router-1 mongosh sh.enableSharding("ecommerce") use ecommerce sh.shardCollection("ecommerce.product", { pid: 1 })
Insert some item in product collection.
db.product.insertOne({pid: 1, name: "product 1"}) db.product.insertOne({pid: 2, name: "product 2"}) db.product.insertOne({pid: 3, name: "product 3"})
Now let's try to check in each shard if our collection items is coming.
docker exec -it shard-1-Node-a mongosh show dbs
Surprisingly you will not find database in one of shard. This is because the balancer only begins balancing after the distribution of data for a sharded collection has reached certain thresholds. Let's run a mongodb script and populate some dummy data.
npm init npm install mongodb touch app.js
const { MongoClient } = require("mongodb"); const url = "mongodb://localhost:27141"; const client = new MongoClient(url); const dbName = "ecommerce"; const x = 3000000; function generateRandomProduct(index) { const categories = ["electronics", "books", "clothing", "sports", "home"]; return { pid: index, name: `Product ${index}`, price: (Math.random() * 100).toFixed(2), category: categories[Math.floor(Math.random() * categories.length)], stock: Math.floor(Math.random() * 500) + 1, }; } async function run() { try { await client.connect(); console.log("Connected successfully to MongoDB"); const db = client.db(dbName); const collection = db.collection("product"); const products = []; for (let i = 1; i <= x; i++) { products.push(generateRandomProduct(i)); } const insertManyResult = await collection.insertMany(products); console.log(`${insertManyResult.insertedCount} products were inserted`); } finally { await client.close(); } } run().catch(console.dir);
After that run script which will populate 3 million records in product collection.
node app.js
Now again check in each shard, you will find the collection.