Guzzle 是个 PHP 框架,解决了发送大量 HTTP 请求和创建 web 服务客户端的问题,是php爬虫的一大必备杀器!
安装
composer.phar require guzzlehttp/guzzle:~6.0
基础常用
请求页面返回html
//引入
require './vendor/autoload.php';
use GuzzleHttp\Client;
$client = new Client();
//GET方式请求链接
$response = $client->request('GET','http://www.baidu.com');
//获取html代码
$html = $response->getBody()->getContents();
echo $html;
要注意,这样返回的结果就是你在该链接右键查看网站源代码所获得的代码,后期用js加载出来的内容你并不能直接获取
//匹配任意标签及表情内元素内容
preg_match('/<video[^>]*([\s\S]*?)<\/video>/', $html, $match);
preg_match('/src="(.*?)"/i', $match[0], $match1);
重定向获取重定向的链接
$originalUrl=$client->get($url, [
'headers' => $headers,
'query' => $query,
'allow_redirects' => false,
]);
使用Guzzle发送一个POST请求JSON
对于Guzzle 5& 6你这样做:
use GuzzleHttp\Client;
$client = new Client();
$response = $client->post('url', [
'json' => ['foo' => 'bar']
]);
如何读取 Guzzle 的 Response 响应结果中的 JSON 数据
$client = new \GuzzleHttp\Client();
# 获取一个外部 API 接口:
$response = $client->get('http://api.map.baidu.com/geocoder/v2/?callback=renderReverse&location=39.87186,116.479723&output=json&pois=1&ak=Your_AK');
json_decode($response->getBody());
报错非string类型 此object或array时
json_decode($response->getBody()->getContents());
举例说明
快手去水印源码
public function kuaishou(){
//https://v.kuaishouapp.com/s/TIh1nhS9
$url=input('url');
$client = new Client();
$response = $client->get($url, [
'headers' => [
'User-Agent' => "Mozilla/5.0 (Linux; Android 8.0.0; Pixel 2 XL Build/OPD1.170816.004) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Mobile Safari/537.36"
],
'allow_redirects' => false,
]);
echo("<pre>");
$url=$response->getHeaders();
$url=$url['Location'][0];
var_dump($url);
$response = $client->get($url,['headers'=>[
'User-Agent' => "Mozilla/5.0 (Linux; Android 8.0.0; Pixel 2 XL Build/OPD1.170816.004) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Mobile Safari/537.36",
]]);
// echo("<pre>");
// var_dump($response->getBody());
$html=$response->getBody()->getContents();
preg_match('/<video[^>]*([\s\S]*?)<\/video>/', $html, $match);
preg_match('/src="(.*?)"/i', $match[0], $match1);
preg_match('/poster="(.*?)"/i', $match[0], $match2);
preg_match('/alt="(.*?)"/i', $match[0], $match3);
var_dump($match1);
var_dump($match2);
var_dump($match3);
}
效果图
皮皮搞笑去水印源码
public function pipigaoxiao(){
//https://h5.ippzone.com/pp/post/407385579526?zy_to=copy_link&share_count=1&m=4836f2984b97d8710f5f2740d70847fb&app=&type=post&did=e8bc2a68c20b0573&mid=1193950739543&pid=407385579526
$url=input('url');
preg_match('/pp\/post\/([0-9]+)/i', $url, $match);
$postId = intval($match[1]);
$client = new Client();
$response = $client->post('http://share.ippzone.com/ppapi/share/fetch_content', [
'json' =>[
'pid' => $postId,
'type' => 'post',
'mid' => '',
]
]);
echo("<pre>");
var_dump(json_decode($response->getBody()->getContents()));
}
效果图
衍生品: