如何在Linux下实现50万并发

如何在Linux下实现50万并发

网络上发现的一篇文章，虽然对问题介绍的不是特别深入，不过作为了解性的文章很不错。标题颇有标题党的感觉。。。

或参考
http://gavin1992.gotoip2.com/papperdetails_tech.php?pid=15&tit=如何在Linux下实现50万并发%20C500K

原文地址：

http://urbanairship.com/blog/2010/09/29/linux-kernel-tuning-for-c500k/

Linux Kernel Tuning for C500k

By Jared "Lucky" Kuolt • September 29th, 2010 • Posted in Android • 24 Comments

Note: Concurrency, as defined in this article, is the same as it is for The C10k problem: concurrent clients (or sockets).

At Urban Airship we recently published a blog post about scaling beyond 500,000 concurrent socket connections. Hitting these numbers was not a trivial exercise so we’re going to share what we’ve come across during our testing. This guide is specific to Linux and has some information related to Amazon EC2, but it is not EC2-centric. These principles should apply to just about any Linux platform.

For our usage, squeezing out as many possible socket connections per server is valuable. Instead of running 100 servers with 10,000 connections each, we’d rather run 2 servers with 500,000 connections apiece. To do this we made the socket servers pretty much just socket servers. Any communication between the client and server is passed through a queue and processed by a worker. Having less for the socket server to do means less code, cpu-usage, and ram-usage.

To get to these numbers we must consider the Linux kernel itself. A number of configurations needed tweaking. But first, an anecdote. （下面开始讲故事....）

The Kernel, OOM, LOWMEM, and You

We first tested our code on a local Linux box that had Ubuntu 64-bit with 6GB of RAM, connecting with several Ubuntu VMs per client using bridged network adapters so we could ramp up our connections. We’d fire up the server and run our clients locally to see just how many connections we could hit. We noticed that we could hit 512,000 with our Java server not even breaking a sweat.

The next step was to test on EC2. We first wanted to see what sort of numbers we could get on “Small” instances, which are 1.7GB 32-bit VMs. We also had to fire up a number of other EC2 instances to act as clients.

We watched the numbers go up and up without a hitch until, seemingly randomly, the Java server fell over. It didn’t print any exceptions or die gracefully—it was killed.

We tried the same process again to see if we could replicate the behavior. Killed again.

Grepping through syslog, we found this line:

Out of Memory: Killed process 2178 java

The OOM-killer killed the Java process. Having watched the free RAM closely, this was odd because we had at least 500MB free at the time of the kill.

The next time we ran it we watched the contents of /proc/meminfo. What we noticed was a steady decline of the field “LowFree”, the amount of LOWMEM that is available. LOWMEM is the kernel-addressable RAM space used for kernel data. Data like socket buffers.

As we increased the number of sockets each socket’s buffers increased the amount of LOWMEM used. Once LOWMEM was full the kernel (instead of simply panicking) found the user process responsible for the usage and promptly killed it so it could continue to function.

On a standard EC2 Small, the configuration is such that the LOWMEM is around 717MB and the rest is “given” to the user. The kernel is smart about reallocating LOWMEM for the user, but not the other way around. The assumption is that the kernel will use very little ram, or at least a predictable finite amount, and the user should be allowed to go crazy. What we needed with our socket server was just the opposite. We needed the kernel to use all the ram it needed—our Java server rarely uses above a few hundred MB.

(简单的说：在32位Linux机器上，服务器端运行一段时间就被killed掉，由于Linux内核可用内存不够,即低端内存不够，大约只有717MB可用，

32位Linux系统下，内核空间极限896M)。

(For an in-depth rundown, take a look at High Memory In The Linux Kernel)

On a 32-bit system the kernel-addressable RAM space is 4GB. Making sure the proper space reserved for the kernel is important. But on 64-bit (x86-64) Linux the kernel-addressable space is 64TB (terabytes). At the current state of computing this is effectively limitless, and as such you will not even see LowMem in /proc/meminfo because it is all LOWMEM.

So we created some EC2 Large instances (each of which is 64-bit with 7.5GB of RAM) and ran our tests again, this time without any surprises. The sockets were added happily and the kernel took all the RAM it needed.

Long story short, you can only scale to so many sockets on a 32-bit platform.

Kernel Options

（为了使服务器能接收尽量多的连接，需要调整的内核参数）

Several parameters exist to allow for tuning and tweaking of socket-related parameters. In /etc/sysctl.conf there are a few options we’ve modified.

First is fs.file-max, the maximum file descriptor limit. The default is quite low so this should be adjusted. Be careful if you’re not ready to go super high.

Second, we have the socket buffer parameters net.ipv4.tcp_rmem and net.ipv4.tcp_wmem. These are the buffers for reads and writes respectively. Each requires three integer inputs: min, default, and max. These each correspond to the number of bytes that may be buffered for a socket. Set these low with a tolerant max to reduce the amount of ram used for each socket.

The relevant portions of our config look like this:

fs.file-max = 999999

net.ipv4.tcp_rmem = 4096 4096 16777216

net.ipv4.tcp_wmem = 4096 4096 16777216

Meaning that the kernel allows for 999,999 open file descriptors and each socket buffer has a minimum and default 4096-byte buffer, with a sensible max of 16MB.

We also modified /etc/security/limits.conf to allow for 999,999 open file descriptors for all users.

#<domain> <type> <item> <value>

* - nofile 999999

You may want to look at the manpage for more information.

（作者调整的是文件系统中最大文件数限制，tcp读写buffer大小。如果是单进程，还需要调整每个进程能打开的最大文件数，ulimit -n）

Testing

When testing, we were able to get about 64,000 connections per client by increasing the number of ephemeral ports allowed on both the client and the server.

echo "1024 65535" > /proc/sys/net/ipv4/ip_local_port_range

This effectively allows every ephemeral port above 1024 be used instead of the default, which is a much lower (and typically more sane) default.

（调整端口范围，必须在1024之后。）

The 64k Connection Myth

It’s a common misconception that you can only accept 64,000 connections per IP address and the only way around it is to add more IPs. This is absolutely false.

The misconception begins with the premise that there are only so many ephemeral ports per IP. The truth is that the limit is based on the IP pair, or said another way, the client and server IPs together. A single client IP can connect to a server IP 64,000 times and so can another client IP.

Were this myth true it would be a significant and easy-to-exploit DDoS vector.

（常见误区）

Scaling for Everyone

When we set out to establish half a million connections on a single server we were diving deep into water that wasn’t well documented. Sure, we know that C10k is relatively trivial, but how about an order of magnitude (and then some) above that?

Fortunately we’ve been able to achieve success without too many serious problems. Hopefully our solutions can help save time for those out there looking to solve the same problems.

///////////////////////////////////////////

摘抄了部分精彩回复
John Kalucki at 12:10 am on October 1, 2010

If you lose connectivity to a large fraction of the internet, your sockets will back up and you might exhaust memory and cause the TCP stack to freeze. Setting the aggregate wmem demand possible to be only a fraction of RAM avoids this out-of-memory situation at the cost of throughput-per-socket. (正确的设置tcp_wmen，否则网络状况不好的时候，oom)

We’ve experienced just this situation a number of times on the User Streams clusters of the Twitter Streaming API. A route flap, or a LB restart can cause an entire cluster of boxes to lock up if we don’t manage wmem correctly.

You may also want to tune tcp_retries2 down from the default of 15, which is about 2 hours, to something more reasonable. This allows you to reap connections to devices that have been shut off, and thus increase your density of active users per server. Otherwise the number of dead connections per server can, in our use-case, reach as high as 30%. We tuned down to 5 or 6, and the ratio of stale established connections is now very low.

(补充tcp_retries2：

How many times to retry before killing alive TCP connection. RFC1122 says that the limit should be longer than 100 sec. It is too small number. The default value of 15 corresponds to ~ 13 - 30 minutes, depending on RTO.

参考tcp协议栈性能调整系列文章)
相关阅读:
[导入]基于Web的B／S结构实时监控系统[转]
[导入]IE5.0与6.0的区别
 [导入]正确配置和维护Apache WEB Server 安全性
 [导入]又是一个烦人的问题
 [导入]今天就写了这一个语句！
DNS解析代码copy
使用uPnP在路由器上映射端口
 查看数据库内存占用
 yield与sleep
wCF REST
原文地址：https://www.cnblogs.com/yizhinantian/p/2005406.html