PHP 要判断网页是否存在, 简单的方法就是 fopen / file_get_contents .. 等等, 有一堆的方式可以做, 不过这些方式都会把整页 HTML 拉回来, 要判断的网址资料很多时, 就会有点慢.
要判断可以由 HTTP HEADER 来判断, 就不用把整页的内容都抓回来(详可见: Hypertext Transfer Protocol -- HTTP/1.1).
fsockopen 判断 HTTP Header
简单的范例如下(转载自: PHP Server Side Scripting - Checking if page exists)
<?php
if ($sock = fsockopen('something.net', 80))
{
fputs($sock, "HEAD /something.html HTTP/1.0\r\n\r\n");
while(!feof($sock)) {
echo fgets($sock);
}
}
?>
会得到下述资料:
HTTP/1.1 200 OK
Date: Mon, 06 Oct 2008 15:45:27 GMT
Server: Apache/2.2.9
X-Powered-By: PHP/5.2.6-4
Set-Cookie: PHPSESSID=4e037868a4619d6b4d8c52d0d5c59035; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Vary: Accept-Encoding
Connection: close
Content-Type: text/html
但是上述做法, 还是会有很多问题, 例如 302 redirect 等等, 简单点的方法, 还是靠 curl 来帮我们处理掉这些麻烦事吧~
PHP + Curl + Content-Type 判断
PHP + Curl 判断此网页是否存在, 详可见: How To Check If Page Exists With CURL | W-Shadow.com
此程式会判断 200 OK 等状态资讯(200 ~ 400 间都是正常的状态).
基本上, 上述那程式已经够用, 不过使用者输入的资料是千奇百怪的, 所以需要加上其它的判断, 下述是随便抓几个有问题的网址:
xxx@ooo.com # Email
http://xxx.ooo.com/abc.zip # 压缩档
因为上述资料, 所以要把上述资讯 Filter 掉, 所以要多检查是否是正常网址, 和 Content-Type 是否是我们要的.
于是程式修改如下(修改自: How To Check If Page Exists With CURL):
<?php
function page_exists($url)
{
$parts = parse_url($url);
if (!$parts) {
return false; /* the URL was seriously wrong */
}
if (isset($parts['user'])) {
return false; /* user@gmail.com */
}
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
/ set the user agent - might help, doesn't hurt /
//curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (compatible; wowTreebot/1.0; +http://wowtree.com)');
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
/ try to follow redirects /
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
/* timeout after the specified number of seconds. assuming that this script runs
on a server, 20 seconds should be plenty of time to verify a valid URL. */
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 15);
curl_setopt($ch, CURLOPT_TIMEOUT, 20);
/ don't download the page, just the header (much faster in this case) /
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_HEADER, true);
/ handle HTTPS links /
if ($parts['scheme'] == 'https') {
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
}
$response = curl_exec($ch);
curl_close($ch);
/ allow content-type list /
$content_type = false;
if (preg_match('/Content-Type: (.+/.+?)/i', $response, $matches)) {
switch ($matches[1])
{
case 'application/atom+xml':
case 'application/rdf+xml':
//case 'application/x-sh':
case 'application/xhtml+xml':
case 'application/xml':
case 'application/xml-dtd':
case 'application/xml-external-parsed-entity':
//case 'application/pdf':
//case 'application/x-shockwave-flash':
$content_type = true;
break;
}
if (!$content_type && (preg_match('/text\/.*/', $matches[1]) || preg_match('/image\/.*/', $matches[1]))) {
$content_type = true;
}
}
if (!$content_type) {
return false;
}
/ get the status code from HTTP headers /
if (preg_match('/HTTP/1.\d+\s+(\d+)/', $response, $matches)) {
$code = intval($matches[1]);
} else {
return false;
}
/ see if code indicates success /
return (($code >= 200) && ($code < 400));
}
// Test & 使用方法:
// var_dump(page_exists('http://tw.yahoo.com'));
?>
Content-Type information
上述 Content-Type 的资讯可由下述找到:
/etc/mime.types
/usr/share/doc/apache-common/examples/mime.types.gz
/usr/share/doc/apache2.2-common/examples/apache2/mime.types.gz # 建议是看这个
- 网址的格式:
function checkUrl($weburl)
{
return !ereg("^http(s)*://[_a-zA-Z0-9-]+(.[_a-zA-Z0-9-]+)*$", $weburl);
}
2 . 判断http 地址是否有效
function url_exists($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_NOBODY, 1); // 不下载
curl_setopt($ch, CURLOPT_FAILONERROR, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
return (curl_exec($ch)!==false) ? true : false;
}
或者
function img_exists($url)
{
return file_get_contents($url,0,null,0,1) ? true : false;
}
或者
function url_exists($url)
{
$head = @get_headers($url);
return is_array($head) ? true : false;
}
实例:
$url='http://www.sendnet.cn';
echo url_exists($url);
如果文章或资源对您有帮助,欢迎打赏作者。一路走来,感谢有您!
txttool.com 说一段 esp56物联 查询128 IP查询