`perldoc LWP`

三个重要的组成部分:

Request Object
Response Object
UserAgent

Request Object

LWP 中的 Request Object 属于 HTTP::Request 类 (不只是发送 HTTP 请求). 其主要的属性为 method, uri, headers, content.

Response Object

LWP 中的 Response Object 属于 HTTP::Response 类. 其主要属性有 code, message, headers, content. 常用来检测 code 属性的方法为 is_success() 和 is_error().

User Agent

将 Request Object 交给 User Agent 处理, 并从 User Agent 处获取 Response Object.

LWP 中的 User Agent 属于 LWP::UserAgent 类.

用 request() 方法接收一个 Request Object, 然后返回一个 Response Object.

常用的属性有:

timeout
agent
from
parse_head
proxy, no_proxy
credentials

示例

# Create a user agent object
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
$ua->agent("MyApp/0.1 ");

# Create a request
my $req = HTTP::Request->new(POST => 'http://search.cpan.org/search');
$req->content_type('application/x-www-form-urlencoded');
$req->content('query=libwww-perl&mode=dist');

# Pass request to the user agent and get a response back
my $res = $ua->request($req);

# Check the outcome of the response
if ($res->is_success) {
    print $res->content;
}
else {
    print $res->status_line, "\n";
}

可用的和 LWP 相关的模块:

    LWP::MemberMixin   -- Access to member variables of Perl5 classes
    LWP::UserAgent   -- WWW user agent class
        LWP::RobotUA   -- When developing a robot applications
    LWP::Protocol          -- Interface to various protocol schemes
        LWP::Protocol::http  -- http:// access
        LWP::Protocol::file  -- file:// access
        LWP::Protocol::ftp   -- ftp:// access
        ...

    LWP::Authen::Basic -- Handle 401 and 407 responses
    LWP::Authen::Digest

    HTTP::Headers      -- MIME/RFC822 style header (used by HTTP::Message)
    HTTP::Message      -- HTTP style message
    HTTP::Request    -- HTTP request
    HTTP::Response   -- HTTP response
    HTTP::Daemon       -- A HTTP server class

    WWW::RobotRules    -- Parse robots.txt files
    WWW::RobotRules::AnyDBM_File -- Persistent RobotRules

    Net::HTTP          -- Low level HTTP client

The following modules provide various functions and definitions.

    LWP                -- This file.  Library version number and documentation.
    LWP::MediaTypes    -- MIME types configuration (text/html etc.)
    LWP::Simple        -- Simplified procedural interface for common functions
    HTTP::Status       -- HTTP status code (200 OK etc)
    HTTP::Date         -- Date parsing module for HTTP date formats
    HTTP::Negotiate    -- HTTP content negotiation calculation
    File::Listing      -- Parse directory listings
    HTML::Form         -- Processing for <form>s in HTML documents

用 `Mojo::DOM` 处理 HTML

在 perldoc Mojo::DOM 中查看文档.

其会把 HTML 解析为 nodes tree, 有 8 种 nodes:

cdata
comment
doctype
pi
raw
root
tag
text

其常见的结构为:

root
|- doctype (html)
+- tag (html)
    |- tag (head)
    |  +- tag (title)
    |     +- raw (Hello)
    +- tag (body)
    +- text (World!)

所有的 nodes 都是 Mojo::DOM 对象.

创建 Mojo::DOM 对象

1 2	`my $dom = Mojo::DOM->new; my $dom = Mojo::DOM->new('<foo bar="baz">I ♥ Mojolicious!</foo>');`

只包含一个 tag:

my $tag = Mojo::DOM->new_tag('div');
my $tag = $dom->new_tag('div');
my $tag = $dom->new_tag('div', id => 'foo', hidden => undef);
my $tag = $dom->new_tag('div', 'safe content');
my $tag = $dom->new_tag('div', id => 'foo', 'safe content');
my $tag = $dom->new_tag('div', data => {mojo => 'rocks'}, 'safe content');
my $tag = $dom->new_tag('div', id => 'foo', sub { 'unsafe content' });

解析 HTML 为 Mojo::DOM 对象

1	`$dom = $dom->parse('<foo bar="baz">I ♥ Mojolicious!</foo>');`

获取根元素

1	`my $root = $dom->root;`

返回一个 element 所有子孙 nodes 的文本内容

my $text = $dom->all_text;

# "foo\nbarbaz\n"
$dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->at('div')->all_text;

查找祖先元素

1
2
3

my $collection = $dom->ancestors;
my $collection = $dom->ancestors('div ~ p');
say $dom->ancestors->map('tag')->join("\n");

向 HTML 追加添加内容

1 2	`$dom = $dom->append_content('<p>I ♥ Mojolicious!</p>'); $dom = $dom->append_content(Mojo::DOM->new);`

(不知道和 append 的区别)

应用元素选择器

1 2	`my $result = $dom->at('div ~ p'); my $result = $dom->at('svg\|line', svg => 'http://www.w3.org/2000/svg');`

获取, 修改, 删除属性值

my $hash = $dom->attr;
my $foo  = $dom->attr('foo');
$dom     = $dom->attr({foo => 'bar'});
$dom     = $dom->attr(foo => 'bar');

This element's attributes.

# Remove an attribute
delete $dom->attr->{id};

# Attribute without value
$dom->attr(selected => undef);

# List id attributes
say $dom->find('*')->map(attr => 'id')->compact->join("\n");

获取子节点

#Return a Mojo::Collection object containing all child nodes of this element as Mojo::DOM objects.
my $collection = $dom->child_nodes;

# "<p><b>123</b></p>"
$dom->parse('<p>Test<b>123</b></p>')->at('p')->child_nodes->first->remove;

# "<!DOCTYPE html>"
$dom->parse('<!DOCTYPE html><b>123</b>')->child_nodes->first;

# " Test "
$dom->parse('<b>123</b><!-- Test -->')->child_nodes->last->content;

通过选择器获取子节点:

# Find all child elements of this element matching the CSS selector and return a Mojo::Collection object containing these elements as Mojo::DOM objects. All selectors from "SELECTORS" in Mojo::DOM::CSS are supported.

my $collection = $dom->children;
my $collection = $dom->children('div ~ p');

# Show tag name of random child element
say $dom->children->shuffle->first->tag;

返回, 设置元素内容

# Return this node's content or replace it with HTML/XML fragment (for "root" and "tag" nodes) or raw content.
my $str = $dom->content;
$dom    = $dom->content('<p>I ♥ Mojolicious!</p>');
$dom    = $dom->content(Mojo::DOM->new);

# "<b>Test</b>"
$dom->parse('<div><b>Test</b></div>')->at('div')->content;

# "<div><h1>123</h1></div>"
$dom->parse('<div><h1>Test</h1></div>')->at('h1')->content('123')->root;

# "<p><i>123</i></p>"
$dom->parse('<p>Test</p>')->at('p')->content('<i>123</i>')->root;

# "<div><h1></h1></div>"
$dom->parse('<div><h1>Test</h1></div>')->at('h1')->content('')->root;

# " Test "
$dom->parse('<!-- Test --><br>')->child_nodes->first->content;

# "<div><!-- 123 -->456</div>"
$dom->parse('<div><!-- Test -->456</div>')
    ->at('div')->child_nodes->first->content(' 123 ')->root;

获取所有子孙元素

# Return a Mojo::Collection object containing all descendant nodes of this element as Mojo::DOM objects.
my $collection = $dom->descendant_nodes;

# "<p><b>123</b></p>"
$dom->parse('<p><!-- Test --><b>123<!-- 456 --></b></p>')
    ->descendant_nodes->grep(sub { $_->type eq 'comment' })
    ->map('remove')->first;

# "<p><b>test</b>test</p>"
$dom->parse('<p><b>123</b>456</p>')
    ->at('p')->descendant_nodes->grep(sub { $_->type eq 'text' })
    ->map(content => 'test')->first->root;

选择器查找整个 HTML 中的元素

# Find all descendant elements of this element matching the CSS selector and return a Mojo::Collection object containing these elements as Mojo::DOM objects. All selectors from "SELECTORS" in Mojo::DOM::CSS are supported.
my $collection = $dom->find('div ~ p');
my $collection = $dom->find('svg|line', svg => 'http://www.w3.org/2000/svg');

# Find a specific element and extract information
my $id = $dom->find('div')->[23]{id};

# Extract information from multiple elements
my @headers = $dom->find('h1, h2, h3')->map('text')->each;

# Count all the different tags
my $hash = $dom->find('*')->reduce(sub { $a->{$b->tag}++; $a }, {});

# Find elements with a class that contains dots
my @divs = $dom->find('div.foo\.bar')->each;

查找所有兄弟元素

# Find all sibling elements after this node matching the CSS selector and return a Mojo::Collection object containing these elements as Mojo::DOM objects. All selectors from "SELECTORS" in Mojo::DOM::CSS are supported.
my $collection = $dom->following;
my $collection = $dom->following('div ~ p');

# List tags of sibling elements after this node
say $dom->following->map('tag')->join("\n");

或:

# Return a Mojo::Collection object containing all sibling nodes after this node as Mojo::DOM objects.
my $collection = $dom->following_nodes;

# "C"
$dom->parse('<p>A</p><!-- B -->C')->at('p')->following_nodes->last->content;

获取下一个兄弟元素:

# Return Mojo::DOM object for next sibling element, or "undef" if there are no more siblings.
my $sibling = $dom->next;

# "<h2>123</h2>"
$dom->parse('<div><h1>Test</h1><h2>123</h2></div>')->at('h1')->next;

获取下一个兄弟 node:

# Return Mojo::DOM object for next sibling node, or "undef" if there are no more siblings.
my $sibling = $dom->next_node;

# "456"
$dom->parse('<p><b>123</b><!-- Test -->456</p>')
    ->at('b')->next_node->next_node;

# " Test "
$dom->parse('<p><b>123</b><!-- Test -->456</p>')
    ->at('b')->next_node->content;

获取之前的兄弟元素:

# Find all sibling elements before this node matching the CSS selector and return a Mojo::Collection object containing these elements as Mojo::DOM objects. All selectors from "SELECTORS" in Mojo::DOM::CSS are supported.
my $collection = $dom->preceding;
my $collection = $dom->preceding('div ~ p');

# List tags of sibling elements before this node
say $dom->preceding->map('tag')->join("\n");

获取之前的兄弟 node:

# Return a Mojo::Collection object containing all sibling nodes before this node as Mojo::DOM objects.
my $collection = $dom->preceding_nodes;

# "A"
$dom->parse('A<!-- B --><p>C</p>')->at('p')->preceding_nodes->first->content;

(似乎也可以用 previous 和 previous_node)

添加元素到 HTML 之前

# Prepend HTML/XML fragment to this node (for all node types other than "root").
$dom = $dom->prepend('<p>I ♥ Mojolicious!</p>');
$dom = $dom->prepend(Mojo::DOM->new);

# "<div><h1>Test</h1><h2>123</h2></div>"
$dom->parse('<div><h2>123</h2></div>')
    ->at('h2')->prepend('<h1>Test</h1>')->root;

# "<p>Test 123</p>"
$dom->parse('<p>123</p>')
    ->at('p')->child_nodes->first->prepend('Test ')->root;

添加文本到 HTML 之前

# Prepend HTML/XML fragment (for "root" and "tag" nodes) or raw content to this node's content.
$dom = $dom->prepend_content('<p>I ♥ Mojolicious!</p>');
$dom = $dom->prepend_content(Mojo::DOM->new);

# "<div><h2>Test123</h2></div>"
$dom->parse('<div><h2>123</h2></div>')
    ->at('h2')->prepend_content('Test')->root;

# "<!-- Test 123 --><br>"
$dom->parse('<!-- 123 --><br>')
    ->child_nodes->first->prepend_content(' Test')->root;

# "<p><i>123</i>Test</p>"
$dom->parse('<p>Test</p>')->at('p')->prepend_content('<i>123</i>')->root;

移除一个元素

# Remove this node and return "root" (for "root" nodes) or "parent".
my $parent = $dom->remove;

# "<div></div>"
$dom->parse('<div><h1>Test</h1></div>')->at('h1')->remove;

# "<p><b>456</b></p>"
$dom->parse('<p>123<b>456</b></p>')
    ->at('p')->child_nodes->first->remove->root;

替换一个元素

# Replace this node with HTML/XML fragment and return "root" (for "root" nodes) or "parent".
my $parent = $dom->replace('<div>I ♥ Mojolicious!</div>');
my $parent = $dom->replace(Mojo::DOM->new);

# "<div><h2>123</h2></div>"
$dom->parse('<div><h1>Test</h1></div>')->at('h1')->replace('<h2>123</h2>');

# "<p><b>123</b></p>"
$dom->parse('<p>Test</p>')
    ->at('p')->child_nodes->[0]->replace('<b>123</b>')->root;

获取父节点

# Return Mojo::DOM object for parent of this node, or "undef" if this node has no parent.
my $parent = $dom->parent;

# "<b><i>Test</i></b>"
$dom->parse('<p><b><i>Test</i></b></p>')->at('i')->parent;

判断是否有某一元素

# Check if this element matches the CSS selector. All selectors from "SELECTORS" in Mojo::DOM::CSS are supported.
my $bool = $dom->matches('div ~ p');
my $bool = $dom->matches('svg|line', svg => 'http://www.w3.org/2000/svg');

# True
$dom->parse('<p class="a">A</p>')->at('p')->matches('.a');
$dom->parse('<p class="a">A</p>')->at('p')->matches('p[class]');

# False
$dom->parse('<p class="a">A</p>')->at('p')->matches('.b');
$dom->parse('<p class="a">A</p>')->at('p')->matches('p[id]');

获取一个元素的选择器

# Get a unique CSS selector for this element.
my $selector = $dom->selector;

# "ul:nth-child(1) > li:nth-child(2)"
$dom->parse('<ul><li>Test</li><li>123</li></ul>')->find('li')->last->selector;

# "p:nth-child(1) > b:nth-child(1) > i:nth-child(1)"
$dom->parse('<p><b><i>Test</i></b></p>')->at('i')->selector;

移除一个元素但是保留其内容

# Remove this element while preserving its content and return "parent".
my $parent = $dom->strip;

# "<div>Test</div>"
$dom->parse('<div><h1>Test</h1></div>')->at('h1')->strip;

获取一个元素的 tag name

# This element's tag name.
my $tag = $dom->tag;
$dom    = $dom->tag('div');

# List tag names of child elements
say $dom->children->map('tag')->join("\n");

使用 Mojo::Base

1 2	`# Alias for "tap" in Mojo::Base. $dom = $dom->tap(sub {...});`

获取当前元素的内容

# Extract text content from this element only (not including child elements).
my $text = $dom->text;

# "bar"
$dom->parse("<div>foo<p>bar</p>baz</div>")->at('p')->text;

# "foo\nbaz\n"
$dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->at('div')->text;

获取 node 的类型

# This node's type, usually "cdata", "comment", "doctype", "pi", "raw", "root", "tag" or "text".
my $type = $dom->type;

# "cdata"
$dom->parse('<![CDATA[Test]]>')->child_nodes->first->type;

# "comment"
$dom->parse('<!-- Test -->')->child_nodes->first->type;

# "doctype"
$dom->parse('<!DOCTYPE html>')->child_nodes->first->type;

# "pi"
$dom->parse('<?xml version="1.0"?>')->child_nodes->first->type;

# "raw"
$dom->parse('<title>Test</title>')->at('title')->child_nodes->first->type;

# "root"
$dom->parse('<p>Test</p>')->type;

# "tag"
$dom->parse('<p>Test</p>')->at('p')->type;

# "text"
$dom->parse('<p>Test</p>')->at('p')->child_nodes->first->type;

获取元素的 value 值

# Extract value from form element (such as "button", "input", "option", "select" and "textarea"), or return "undef" if this element has no value. In the case of "select" with "multiple" attribute, find "option" elements with "selected" attribute and return an array reference with all values, or "undef" if none could be found.
my $value = $dom->val;

# "a"
$dom->parse('<input name=test value=a>')->at('input')->val;

# "b"
$dom->parse('<textarea>b</textarea>')->at('textarea')->val;

# "c"
$dom->parse('<option value="c">Test</option>')->at('option')->val;

# "d"
$dom->parse('<select><option selected>d</option></select>')
->at('select')->val;

# "e"
$dom->parse('<select multiple><option selected>e</option></select>')
->at('select')->val->[0];

# "on"
$dom->parse('<input name=test type=checkbox>')->at('input')->val;

用一个元素包裹另一个元素

# Wrap HTML/XML fragment around this node (for all node types other than "root"), placing it as the last child of the first innermost element.
$dom = $dom->wrap('<div></div>');
$dom = $dom->wrap(Mojo::DOM->new);

# "<p>123<b>Test</b></p>"
$dom->parse('<b>Test</b>')->at('b')->wrap('<p>123</p>')->root;

# "<div><p><b>Test</b></p>123</div>"
$dom->parse('<b>Test</b>')->at('b')->wrap('<div><p></p>123</div>')->root;

# "<p><b>Test</b></p><p>123</p>"
$dom->parse('<b>Test</b>')->at('b')->wrap('<p></p><p>123</p>')->root;

# "<p><b>Test</b></p>"
$dom->parse('<p>Test</p>')->at('p')->child_nodes->first->wrap('<b>')->root;

(不知道和 wrap_content 的区别)

XML 开关

1 2	`my $bool = $dom->xml; $dom = $dom->xml($bool);`

应用 grep

其参数是一个匿名函数:

1	`my $img_collection = $dom->find('img')->grep(sub { $_->attr('src') =~ /http/ });`

`perldoc LWP::Simple`

Perl

Perl-LWP库

http://example.com/2023/10/26/Perl-LWP库/

作者

Jie

发布于

2023年10月26日

许可协议

Dwm-修改默认的-layout 上一篇

傅里叶级数理解下一篇