文章目录[隐藏]
获取客户端IP不是如看上去那般容易的事情,要比较透彻地了解各种意见的由来和优劣,制定出最合适自己需求的方案,需要对HTTP协议、Web服务器、代理服务器、客户端策略、负载均衡等都有些了解。本文总结、整理了关于此问题的调研、理解,并以PHP为例做一些演示。
有几个原则/事实要事先澄清:
- 没有万全的办法以获取最初发起请求的用户的IP
- 没有最好的方案,只有最适合需求常见的选择
- REMOTE_ADDR是唯一确定会存在且可信的
-
问题复杂性的来源主要有两个:
- 1、真实用户可能在一个/多个代理的后面,代理的匿名性是不确定的
- 2、HTTP_*请求头是可以伪造的
结论代码
如果IP被用于授权,或获取IP以反爬虫,则应该直接使用REMOTE_ADDR
,因为它不可伪造,且IP资源的稀缺性导致用户更换IP(包括更换代理服务器)成本较高。
// PHP应直接用以下代码获取IP:
$ip = $_SERVER['REMOTE_ADDR'];
如果获取用户IP是业务逻辑需要,如根据用户位置提供个性化信息,或是为满足日志/统计的需求,则应该尽量获取最初发起请求的IP,这些情景下客户一般没有动机伪造HTTP_*_FORWARDED_*
,因此以下代码通常可满足需求:
function get_ip() {
foreach (array('HTTP_CLIENT_IP', 'HTTP_X_FORWARDED_FOR', 'HTTP_X_FORWARDED', 'HTTP_X_CLUSTER_CLIENT_IP', 'HTTP_FORWARDED_FOR', 'HTTP_FORWARDED', 'REMOTE_ADDR') as $key) {
if (array_key_exists($key, $_SERVER) === true) {
foreach (array_map('trim', explode(',', $_SERVER[$key])) as $ip) {
if (filter_var($ip, FILTER_VALIDATE_IP, FILTER_FLAG_NO_PRIV_RANGE | FILTER_FLAG_NO_RES_RANGE) !== false) {
return $ip;
}
}
}
}
// 以上代码会过滤掉本地地址、保留地址,这可能导致本地开发环境、代码服务器前有反向代理等情形下无法获取IP,所以下行用于“托底”
return $_SERVER['REMOTE_ADDR'];
}
其中有以下几个值得关注的知识点:
- 此检索顺序是为了尽量获取最初发起请求的用户的地址
- REMOTE_ADDR是肯定存在的、确定可信的,但因为这个值可能是代理服务器的地址,所以只用作托底
- HTTP_X_FORWARDED_FOR、HTTP_X_FORWARDED、HTTP_FORWARDED_FOR、HTTP_FORWARDED都是为了获取最初发起代理请求的地址
- 因为代理服务器可能有多个,按路由先后顺序存储为逗号分隔的字符串,所以有explode的动作
- 从HTTP_*获取的值可能是伪造的/非法的,所以要用filter_var过滤,去除不合法的IP地址,以及私有、保留地址等
调查和验证
先了解$_SERVER
通常有哪些字段跟我们的需求有关。用Chrome访问以下PHP脚本:
foreach ($_SERVER as $key => $value) {
echo ($value === '') ? sprintf('%s:
', $key) : sprintf('%s: %s
', $key, $value);
}
获取的结果中,跟IP相关的结果是:
# 未使用代理,这是我本地的IP
REMOTE_ADDR: 213.76.49.123
REMOTE_PORT: 7966
# 这是运行脚本的服务器地址
SERVER_PORT: 80
SERVER_ADDR: 216.148.28.125
SERVER_NAME: 216.148.28.125
下面来看看,在几种代理环境下,获取IP的代码中通常出现的字段的值是怎么样的:
本地测试环境的信息如下:
- 请求目标地址是
216.148.28.125
- 本地地址是
213.76.49.123
- HTTP代理地址是
112.205.64.85:1234
- Socket代理在本地是
127.0.0.1:8010
,远程也是112.205.64.85
下面分别用 不走代理、走HTTP代理、走SOCKS代理 三种设定进行测试:
PHP脚本获取HTTP HEADER字段
脚本内容如下:
// 一些很少实际有值的字段,未标准化或者只被一些云服务商使用:
echo 'HTTP_CLIENT_IP: ' . $_SERVER['HTTP_CLIENT_IP'] . '
';
echo 'HTTP_X_CLUSTER_CLIENT_IP: ' . $_SERVER['HTTP_X_CLUSTER_CLIENT_IP'] . '
';
// 这两个是CloudFlare自定义的字段,详见后文和文末链接
echo 'HTTP_CF_CONNECTING_IP: ' . $_SERVER['HTTP_CF_CONNECTING_IP'] . '
';
echo 'HTTP_X_REAL_IP: ' . $_SERVER['HTTP_X_REAL_IP'] . '
';
// 代理服务器可能添加的HTTP头字段:
echo 'HTTP_FORWARDED: ' . $_SERVER['HTTP_FORWARDED'] . '
';
echo 'HTTP_FORWARDED_FOR: ' . $_SERVER['HTTP_FORWARDED_FOR'] . '
';
echo 'HTTP_X_FORWARDED: ' . $_SERVER['HTTP_X_FORWARDED'] . '
';
echo 'HTTP_X_FORWARDED_FOR: ' . $_SERVER['HTTP_X_FORWARDED_FOR'] . '
';
// 确定有值的字段,是跟目标服务器实际连接的请求方的地址:
echo 'REMOTE_ADDR: ' . $_SERVER['REMOTE_ADDR'] . '
';
以下是三种测试场景下的请求头和结果:
# 直接连接:
Request URL: http://216.148.28.125/checkip.php
Request Method: GET
Status Code: 200 OK
Remote Address: 216.148.28.125:80
Referrer Policy: no-referrer-when-downgrade
HTTP_CF_CONNECTING_IP:
HTTP_CLIENT_IP:
HTTP_FORWARDED:
HTTP_FORWARDED_FOR:
HTTP_X_CLUSTER_CLIENT_IP:
HTTP_X_FORWARDED:
HTTP_X_FORWARDED_FOR:
HTTP_X_REAL_IP:
REMOTE_ADDR: 213.76.49.123
# 通过HTTP代理连接:
Request URL: http://216.148.28.125/checkip.php
Request Method: GET
Status Code: 200 OK
Remote Address: 112.205.64.85:1234
Referrer Policy: no-referrer-when-downgrade
HTTP_CF_CONNECTING_IP:
HTTP_CLIENT_IP:
HTTP_FORWARDED:
HTTP_FORWARDED_FOR:
HTTP_X_CLUSTER_CLIENT_IP:
HTTP_X_FORWARDED:
HTTP_X_FORWARDED_FOR:
HTTP_X_REAL_IP:
REMOTE_ADDR: 112.205.64.85
# 通过本地socket代理连接:
Request URL: http://216.148.28.125/checkip.php
Request Method: GET
Status Code: 200 OK
Remote Address: 127.0.0.1:8010
Referrer Policy: no-referrer-when-downgrade
HTTP_CF_CONNECTING_IP:
HTTP_CLIENT_IP:
HTTP_FORWARDED:
HTTP_FORWARDED_FOR:
HTTP_X_CLUSTER_CLIENT_IP:
HTTP_X_FORWARDED:
HTTP_X_FORWARDED_FOR:
HTTP_X_REAL_IP:
REMOTE_ADDR: 112.205.64.85
总结几个规律:
- 三个场景下都只有REMOTE_ADDR取到值
- 如果走代理,Chrome的
Remote Address
字段都是直接连接的代理地址(Socket代理时为本地端),而服务器端的脚本获取的REMOTE_ADDR都是远程代理的地址 - 因为HTTP代理使用的TinyProxy默认不设定FORWORD相关的字段,所以四个相关字段都没取到值
Chrome访问HttpBin
请求HttpBin get ip,看目标服务器是否可以获得真实的IP地址:
先看一下httpbin.org的IP地址,有两条A记录:
Python 2.7.12 (v2.7.12:d33e0cf91556, Jun 27 2016, 15:19:22) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> socket.gethostbyname_ex("httpbin.org")
('httpbin.org', [], ['3.213.232.135', '52.20.9.77'])
然后是三个请求的请求头和相应内容:
# 直接连接:
Request URL: http://httpbin.org/ip
Request Method: GET
Status Code: 200 OK
Remote Address: 3.213.232.135:80
Referrer Policy: no-referrer-when-downgrade
{
"origin": "213.76.49.123, 213.76.49.123"
}
# 通过HTTP代理连接:
Request URL: http://httpbin.org/ip
Request Method: GET
Status Code: 200 OK
Remote Address: 112.205.64.85:1234
Referrer Policy: no-referrer-when-downgrade
{
"origin": "112.205.64.85, 112.205.64.85"
}
响应头有额外的以下字段/值:
Via: 1.1 tinyproxy (tinyproxy/1.8.3)
# 通过本地socket代理连接:
Request URL: http://httpbin.org/ip
Request Method: GET
Status Code: 200 OK
Remote Address: 127.0.0.1:8010
Referrer Policy: no-referrer-when-downgrade
{
"origin": "112.205.64.85, 112.205.64.85"
}
简单总结:
- 当使用代理时,目标服务器都没有获取本地的真实地址
- 如果用TinyProxy提供的HTTP代理,可以看到响应头中有
via
字段,表示目标服务器可以了解到用户使用了代理
反向代理
如果将站点放在CloudFlare后面,也就是使用CloudFlare作为反向代理,如何获取真实的用户IP以统计真实的访客数据呢?
如果对上面的内容理解到位,下面这个解释就不难理解了:
由于 Cloudflare 充当反向代理,因此配置后,不出所料的话,与源 Web 服务器的所有连接都将来自 Cloudflare 的 IP 地址,这可能会(也可能不会)引发问题:
- 如果您的 Web 应用程序使用访问者的原始 IP 作为其逻辑的一部分,它现在将使用 Cloudflare IP 地址
- 如果您使用访问日志的内容,则访问日志现在会包含一个 Cloudflare IP 地址作为 $remote_addr
不过,Cloudflare 遵循行业标准并将访问者的 IP 地址包含在 X-Forwarded-For 标头中。我们还添加了一个 CF-Connecting-IP 标头,它可能也会用到。这两个标头中的任意一个均可用于针对 Web 应用程序恢复访问者的原始 IP 或者包含在您的日志中。
笔记
REMOTE_ADDR might not contain the real IP of the TCP connection. This entirely depends on your SAPI. Ensure that your SAPI is properly configured such that $_SERVER[‘REMOTE_ADDR’] actually returns the IP of the TCP connection. Failing that might give rise to some serious vulnerabilities, for example, StackExchange used to grant admin access by checking REMOTE_ADDR to see if it matches “localhost”, unfortunately the SAPI’s config had a vulnerability (it takes HTTP_X_FORWARDED_FOR as input) which allows non-admins to gain admin access by altering the HTTP_X_FORWARDED_FOR header. Also see blog.ircmaxell.com/2012/11/anatomy-of-attack-how-i-hacked.html
—–
This is bad as HTTP_CLIENT_IP and HTTP_X_FORWARDED_FOR can be forged. Only REMOTE_ADDR cannot.
—–
HTTP_X_FORWARDED_FOR can have multiple ip like ‘1.1.1.1,2.2.2.2’ and this functions don’t handle it. read https://en.wikipedia.org/wiki/X-Forwarded-For
—–
Since it’s possible to spoof the HTTP_X_FORWARDED_FOR header, it’s a good idea to test it with filter_var(trim($addr), FILTER_VALIDATE_IP) to make sure you at least have a valid IP address before returning it.
—–
Proxies may send a HTTP_X_FORWARDED_FOR header but even that is optional.
Also keep in mind that visitors may share IP addresses; University networks, large companies and third-world/low-budget ISPs tend to share IPs over many users.
—–
Just one note. Third world ISP does opposite. They create dynamic ip for each login. So its multiple ips per user and not one ip for multiple user.
—–
The header “specification” can handle multiple proxies, the chain of ips will be comma separated in the header value.
—–
The client can set all HTTP header information (ie. $_SERVER[‘HTTP_…) to any arbitrary value it wants. As such it’s far more reliable to use $_SERVER[‘REMOTE_ADDR’], as this cannot be set by the user.
—–
a) will $_SERVER[‘REMOTE_ADDR’] always exist if php is ran in web mode.
b) if $_SERVER[‘REMOTE_ADDR’] does always exist, will it always contain a properly syntaxed ip?
Yes, it is always present in web mode, and since the IP address is converted from its binary representation to the textual format you’re seeing, it is always valid – there is no way to specify an invalid IP in the IP header.
One more thing: Don’t assume any special format unless you absolutely must deal with IP addresses. For example, IPv6 addresses are longer and contain different characters. Basically, deal with IP addresses as an opaque string.
参考资料
- How to get the client IP address in PHP
- How to get Real IP from Visitor?
- 到底有没有$_SERVER[‘HTTP_CLIENT_IP’]
- 获取客户端IP ,HTTP_CLIENT_IP 是一个骗局吗?
- 使用Nginx时如何恢复原始访问者的IP
- is it necessary to validate $_SERVER[‘REMOTE_ADDR’]?
- How does Cloudflare handle HTTP Request headers?
-- EOF --
本文最后修改于5年前 (2019-05-21)