Tencent Cloud CVM Setting Up Auto Scaling on Tencent Cloud Accounts

Tencent Cloud / 2026-04-27 17:37:19

Introduction: Your App Needs a Growth Spurt (Not a Panic Attack)

Auto Scaling is one of those cloud features that sounds fancy until you actually need it. One minute you’re confidently sipping tea because traffic is “light.” The next minute your dashboard looks like someone spilled a spreadsheet on the screen, and you start thinking thoughts like: “Did we just get hacked?” (Spoiler: usually it’s just a viral post, not an alien invasion.)

Tencent Cloud CVM This article walks you through setting up Auto Scaling on Tencent Cloud accounts. We’ll keep it practical and readable, with enough detail that you can implement it without feeling like you need a secret wizard certification. If you’ve ever wondered how to scale your compute resources automatically based on demand, you’re in the right place.

By the end, you should know how to:

Prepare the right Tencent Cloud resources and permissions
Create a scaling group
Attach instances or let Tencent create them
Define scaling policies using metrics and thresholds
Use health checks and cooldown periods to avoid “thrashing”
Troubleshoot common issues and estimate cost impact

What Auto Scaling Actually Does (In Human Language)

Auto Scaling monitors metrics (like CPU usage, request count, or custom CloudMonitor metrics). When demand crosses a threshold, it automatically adjusts the number of instances in a scaling group.

Think of it like a thermostat, except instead of adjusting your living room temperature, it adjusts your number of servers. When it’s cold (low load), it saves money by reducing instances. When it’s hot (high load), it adds instances to keep your application responsive.

Prerequisites on Tencent Cloud: Before You Press the “Create” Button

Before setting up Auto Scaling, you need a few building blocks. Tencent Cloud’s exact menu names can vary depending on service evolution, region, and console updates, but the concepts remain stable.

Tencent Cloud CVM 1) Decide Your Target Architecture

Auto Scaling usually works best with a load balancer. Why? Because scaling is mostly about adding/removing compute capacity, while your load balancer distributes traffic across instances.

Typical setup:

Load Balancer (e.g., CLB) in front
Auto Scaling Group managing a set of instances (ECS)
Health checks to remove unhealthy instances
Application instances behind it

If you skip the load balancer, you can still do scaling, but it becomes harder to route traffic correctly. Also, your users may experience the kind of “who moved my cheese” behavior you didn’t plan for.

2) Ensure You Have an ECS-Based App That Can Scale Out

Your instances should be stateless or at least horizontally scalable. That means:

Session state should be stored externally (e.g., Redis) or handled by sticky sessions via load balancer if appropriate.
Files should be on shared storage (e.g., COS) or baked into the image.
Database should be a separate service (RDS/CDB or similar) that can handle concurrent connections.

Tencent Cloud CVM If your app stores session data locally on instance disk, scaling out will work… but your users may feel like they’re being teleported to a different universe. In other words: possible, but not recommended.

3) Prepare a Launch Template or Image

Auto Scaling needs a way to launch instances consistently. Common approaches:

Use a preconfigured image (CVM/ECS image or custom image)
Use a launch configuration/template that includes startup scripts

Make sure your initialization installs dependencies, configures the app, and joins the correct environment (region, config endpoints, etc.).

4) Review IAM Permissions (Tencent Cloud Account Settings)

Auto Scaling involves multiple actions: creating/terminating instances, updating load balancer backends, writing to monitoring, and so on. If you get an authorization error, it usually comes down to missing permissions.

In general, you’ll need permissions related to:

Auto Scaling resource operations
ECS instance lifecycle actions
Load balancer backend registration (or target group operations)
Monitoring and metrics access

If you’re using a dedicated sub-account or a role, double-check that it can manage the specific resources in the chosen region.

Baseline Setup: Choose the Right Scaling Strategy

Before you configure policies, you need to choose how the scaling group behaves.

1) Minimum, Maximum, and Desired Capacity

Every scaling group has limits, typically:

Min Instances: how low it can go
Max Instances: how high it can go
Desired Instances: how many you want at steady state

Choosing these values is like setting guardrails for a bike downhill. Too tight and you’ll fail to handle spikes. Too loose and your bill might take a sudden vacation.

2) Cooldown Periods to Prevent “Scaling Ping-Pong”

If your policy adds an instance, and the metric still shows high load for a moment, you don’t want the system to instantly add five more because the first new instance hasn’t warmed up yet.

Cooldown periods give your app time to start and stabilize. Without cooldown, you risk scaling up and down rapidly—often called thrashing. Thrashing doesn’t sound like a technical term; it sounds like a dance move. It is not fun for your infrastructure.

3) Scaling Type: Step vs. Target Tracking (Conceptually)

Tencent Cloud CVM Different Auto Scaling systems offer different policy types. The idea is the same:

Step/Threshold Scaling: When metric > X, add Y instances. When metric < Z, remove W instances.
Target Tracking: Keep the metric near a target (like average CPU at 60%). The system adjusts gradually.

Target tracking often feels more “set and forget,” while step scaling gives you more direct control. For many teams, step scaling is easiest to start with.

Creating an Auto Scaling Group on Tencent Cloud (Step-by-Step)

Now for the main event: setting up Auto Scaling on your Tencent Cloud account.

Exact UI labels may differ, but the flow generally looks like this.

Step 1: Open the Auto Scaling Console

If you don’t see the service, use the search bar. Also verify that your account has the necessary permissions and that Auto Scaling is enabled for your region.

Step 2: Select Region and VPC/Network Settings

Auto Scaling instances need to live somewhere. Choose your VPC and subnet(s). If you’re using a load balancer, align subnet selections so the load balancer can reach the instances.

Common pitfall: you configure subnets that aren’t reachable from the load balancer. That’s not an Auto Scaling problem; that’s a networking “spot the mismatch” problem.

Step 3: Configure the Instance Launch Method

Select the launch configuration/template. You may choose:

An existing launch configuration
A template with startup scripts
An image with preinstalled application dependencies

Double-check:

Instance type (CPU/memory)
Security group (open required ports, allow health checks)
Disk type/size
Auto-assignment of Elastic IP/containers (if applicable)

Step 4: Attach the Load Balancer (Recommended)

If Tencent Cloud Auto Scaling integrates with load balancers via target groups/listeners, you’ll configure which load balancer and which backend target set to use.

Key items:

Listener/protocol/port
Health check settings
Target registration mode

If your app needs special health endpoints (e.g., /healthz), make sure they return success only when the application is truly ready. Otherwise, the system may register instances that are “alive but not ready,” and that’s how you get 502 errors wearing a disguise.

Step 5: Set Initial Capacity

Choose initial instance count and minimum/maximum. Many teams set min = 1 or 2, max based on expected peak load, and desired equal to min or slightly higher for faster readiness.

Also configure:

Instance warm-up time (if available)
Cooldown period (if not part of policies)

Step 6: Configure Health Checks and Termination Policies

Auto Scaling typically includes health check logic. You might configure:

Health check type (load balancer health check or platform health check)
Unhealthy thresholds (how many failures trigger replacement)
Grace period for termination (to avoid killing requests in the middle of a transaction)

Graceful termination matters if your application serves long requests. If you can, implement connection draining/termination hooks. Otherwise, users may see errors exactly when the cloud decides to be helpful.

Defining Scaling Policies: The “When Should We Scale?” Rules

Policies are the brain of Auto Scaling. Without policies, your scaling group is just a fancy collection of potential servers.

1) Pick the Right Metrics (CPU Is Not Always King)

Common metrics include:

CPU Utilization: easy and often useful
Memory Utilization: important for memory-heavy apps
Network In/Out: good for IO-bound services
Request count / QPS: best if available
Latency: great for user experience, if measurable
Custom metrics: like queue length, job backlog, error rate

Many teams start with CPU or request count, then refine. Here’s the truth: the best metric depends on your workload. CPU might be low while response times are terrible due to database bottlenecks. So, choose metrics aligned to what you want to protect.

2) Decide the Thresholds

If using threshold-based scaling, pick high and low boundaries.

Example starting point (not universal, but practical):

Scale out when average CPU > 65% for 2–3 minutes
Scale in when average CPU < 35% for 5 minutes

Why do scale-in thresholds differ? Because you want to avoid rapid oscillation. Systems that scale out aggressively but scale in slowly generally behave better.

3) Choose the Scaling Step Size

Step size is how many instances you add/remove per event.

Common approaches:

Tencent Cloud CVM Small steps: add 1 instance at a time for safety
Larger steps: add multiple instances if you expect sudden spikes

If scaling takes time to boot your app, larger steps can reduce time-to-recovery. If boot time is fast and workload changes gradually, smaller steps are fine.

4) Set Cooldown and Evaluation Periods

Cooldown prevents immediate re-triggering. Evaluation periods smooth out temporary spikes. For example:

Trigger if CPU > 65% for 3 consecutive periods
Cooldown 300 seconds between scale actions

Without these, your Auto Scaling may respond to noise like a human hearing a single “ding” from the microwave and assuming the kitchen is on fire.

Common Auto Scaling Policy Patterns (Copy the Idea, Not the Numbers)

Here are a few patterns teams often use.

Pattern A: CPU-Based Scale Out/In

Use CPU for general responsiveness.

Tencent Cloud CVM Scale out when CPU is high
Scale in when CPU stays low

Works well when CPU correlates strongly with load.

Pattern B: Request Count (QPS) Scaling

If your app is request-driven and CPU is not always proportional, scale based on request rate.

Scale out when requests per minute exceed threshold
Scale in when requests drop below threshold

This typically aligns better with user traffic patterns.

Pattern C: Queue Length Scaling

For background jobs (workers), scale using queue length or processing backlog. If you have a message queue, backlog growth is usually a more accurate signal than CPU.

Scale out when queue length > X
Scale in when queue length < Y

Integrating with Load Balancers: Health Checks and Real-World Reliability

Most Auto Scaling setups fail in the “almost working” stage due to health checks and traffic handling. Let’s address that.

1) Health Check Endpoint Design

A health endpoint should reflect readiness. Consider:

Return success only after dependencies are reachable (database connection, config fetch, etc.)
Keep it fast and reliable (no long queries)
Use separate readiness vs liveness semantics if your platform supports it

If health checks are too strict, instances may flap. If too lenient, you’ll send traffic to a half-initialized server and blame Auto Scaling (it’s innocent this time).

2) Graceful Shutdown / Connection Draining

When scaling in, instances are often terminated. If they’re serving requests, you want them to stop taking new requests while finishing existing ones.

Depending on your stack, you can implement:

Load balancer deregistration delay
Application-level draining
Stop accepting new connections on SIGTERM (or equivalent)

This reduces error spikes during scale-in events.

3) Handling Warm-Up Time

Auto Scaling can launch instances quickly, but your application might need time to load caches or connect to services.

Use a warm-up period if available so that new instances aren’t judged unhealthy prematurely.

Testing Your Auto Scaling Setup (Before Production Does It for You)

Tencent Cloud CVM Auto Scaling should not be discovered via customer complaints. Do load testing in staging or a controlled environment.

1) Use Synthetic Load

Test with tools that simulate traffic patterns. Measure:

How quickly instances scale out
Tencent Cloud CVM How long it takes for new instances to become healthy
Whether error rates spike during scale-out

2) Validate Scale-In Behavior

After reducing load, confirm that scale-in triggers after appropriate cooldown and doesn’t terminate instances too aggressively.

Also watch whether your sessions or background jobs behave correctly across scaling events.

3) Confirm Monitoring and Alerts

Make sure metrics feed into Auto Scaling policies correctly. Then add alerts for:

Scaling events count (so you know when it’s happening)
Instance health ratio
Error rate and latency
Potential runaway scaling (high cost risk)

Troubleshooting: When Auto Scaling Behaves Like It Has Opinions

Let’s cover common issues and what to check.

Issue 1: No Scaling Happens

Verify the metric is being collected and is in the correct namespace.
Check thresholds and evaluation periods (maybe they’re never satisfied).
Confirm the scaling group is attached to the correct resources (load balancer targets, etc.).
Check instance limits (min/max might be set such that scaling can’t occur).

In other words: “Are we watching the right thing, at the right time, with the right settings?”

Tencent Cloud CVM Issue 2: Instances Scale Out but Traffic Fails

Health checks might be failing (wrong endpoint, firewall rules, missing ports).
Startup scripts might not configure the app correctly.
Security group rules may block required inbound traffic or health checks.
Load balancer listeners might not point to the right target group.

Check instance logs and health check status. Auto Scaling can’t fix your application’s missing dependency. It can only multiply the problem slightly faster.

Issue 3: Scaling Is Too Aggressive (Bill Shock Included)

Thresholds may be too sensitive (e.g., CPU > 40% constantly triggers).
Cooldown might be too short.
Max instances might be too high for your risk tolerance.

Start conservative and adjust after you see real traffic patterns.

Issue 4: Flapping (Constant Scale Up and Down)

Use hysteresis: separate scale-in and scale-out thresholds.
Increase evaluation periods and cooldown.
Ensure your metric isn’t noisy; consider smoothing.

Flapping is like a car alarm with a cat living nearby. Reduce sensitivity and add delays.

Issue 5: Instances Launch but Never Become Healthy

Health check expectations don’t match your app (wrong port/path).
Startup time exceeds health check grace period.
External dependencies (database, config service) fail during boot.

Revisit health check configuration and startup script reliability.

Cost Considerations: Scaling Without Scaling Your Budget Anxiety

Auto Scaling reduces cost when demand drops, but it can also increase cost if policies are too aggressive or if your max capacity is unlimited (or simply too high).

1) Compute Cost Drivers

Instance hourly cost (and whether they’re spot/preemptible or regular)
Number of instances and how long they stay active
Load balancer and network egress costs

Also consider that scale-out events can create temporary overcapacity during boot and warm-up periods.

2) Set a Realistic Max Instances

Your max should reflect your capacity planning and budget. If you’re not sure, start with a conservative max and increase after observing traffic patterns.

3) Use Maintenance Windows and Deployment Strategy

During deployments, CPU/latency can spike. If your scaling policies are sensitive, deployments can trigger unnecessary scale events.

Consider:

Temporarily adjusting policies during deployments
Using deployment strategies that reduce sudden load (rolling updates, prewarming)

Operational Best Practices (So You Can Sleep Like It’s an Achievement)

Once Auto Scaling is running, the job is not “done.” It’s more like setting up a smart assistant and then checking if it’s learning the household rules.

1) Log Scaling Events

Track scale-out and scale-in events, including why they triggered (metric and threshold).

When something looks odd, you’ll know whether Auto Scaling acted correctly or if the inputs were wrong.

2) Keep Startup Scripts Idempotent

If startup scripts run multiple times or on instance replacement, they should not cause duplicate configuration or conflicting settings.

3) Monitor Application-Level Metrics

Infrastructure metrics are useful, but user experience metrics are better. Keep an eye on:

Request latency
Error rate
Database connection saturation
Queue lag (for async processing)

Auto Scaling helps when the bottleneck is compute. If the bottleneck is the database, you may need database scaling or query optimization too.

Tencent Cloud CVM A Practical Example Setup (Template for Your Own)

Let’s assemble a realistic “starter” setup. Numbers are illustrative; adjust based on your app.

Scenario

Web application behind a load balancer
Instances are stateless
Metrics available: CPU utilization and request rate

Example Scaling Group Configuration

Min instances: 2
Tencent Cloud CVM Max instances: 10
Cooldown: 300 seconds
Health checks: via load balancer target health
Graceful termination: 30 seconds (or your app’s safe window)

Example Scaling Policies

Scale Out Policy: If average CPU > 65% for 3 minutes, add 2 instances.
Scale In Policy: If average CPU < 35% for 5 minutes, remove 1 instance.

If request rate metrics are available and more accurate for your workload, you can replace CPU thresholds with QPS thresholds.

Frequently Asked Questions

Does Auto Scaling work without a load balancer?

Technically you might manage instances without a load balancer, but routing traffic becomes harder. A load balancer provides centralized health checks and traffic distribution, which makes scaling safer and simpler.

Will Auto Scaling replace unhealthy instances automatically?

Often yes, depending on the health check integration. Instances that fail health checks can be marked unhealthy and replaced. Make sure your health endpoint and security rules are correct.

How do I prevent scaling actions during deployments?

You can coordinate deployments with scaling policies by pausing policies temporarily (if supported) or using deployment strategies that reduce traffic spikes.

What if CPU is low but users complain?

Then CPU is probably not the bottleneck. Check database performance, thread pools, cache hit rates, or latency metrics. Consider scaling based on latency, queue length, or request metrics instead.

Conclusion: Auto Scaling Is a Tool, Not a Magic Spell

Setting up Auto Scaling on Tencent Cloud accounts is absolutely doable, and once it’s configured well, it feels like having a backstage crew that adjusts the show’s props without telling you. Your job becomes monitoring and iterating rather than manually adding servers during spikes.

To succeed, remember the key principles:

Use a load balancer and reliable health checks
Choose metrics that represent real user pain, not just CPU theater
Set sensible min/max capacities and cooldown periods
Test in staging and validate scale-out and scale-in behaviors
Watch for flapping, authorization issues, and misconfigured startup scripts

Get those right, and Auto Scaling will handle growth spurts gracefully—without you having to do the server equivalent of sprinting up stairs every time traffic spikes.