LWP::MemberMixin -- Access to member variables of Perl5 classes LWP::UserAgent -- WWW user agent class LWP::RobotUA -- When developing a robot applications LWP::Protocol -- Interface to various protocol schemes LWP::Protocol::http -- http:// access LWP::Protocol::file -- file:// access LWP::Protocol::ftp -- ftp:// access ...
The following modules provide various functions and definitions.
LWP -- This file. Library version number and documentation. LWP::MediaTypes -- MIME types configuration (text/html etc.) LWP::Simple -- Simplified procedural interface for common functions HTTP::Status -- HTTP status code (200 OK etc) HTTP::Date -- Date parsing module for HTTP date formats HTTP::Negotiate -- HTTP content negotiation calculation File::Listing -- Parse directory listings HTML::Form -- Processing for <form>s in HTML documents
用 Mojo::DOM 处理 HTML
在 perldoc Mojo::DOM 中查看文档.
其会把 HTML 解析为 nodes tree, 有 8 种 nodes:
cdata
comment
doctype
pi
raw
root
tag
text
其常见的结构为:
1 2 3 4 5 6 7 8
root |- doctype (html) +- tag(html) |- tag(head) | +- tag(title) | +- raw (Hello) +- tag(body) +- text (World!)
所有的 nodes 都是 Mojo::DOM 对象.
创建 Mojo::DOM 对象
1 2
my $dom = Mojo::DOM->new; my $dom = Mojo::DOM->new('<foo bar="baz">I ♥ Mojolicious!</foo>');
只包含一个 tag:
1 2 3 4 5 6 7
my $tag = Mojo::DOM->new_tag('div'); my $tag = $dom->new_tag('div'); my $tag = $dom->new_tag('div', id =>'foo', hidden =>undef); my $tag = $dom->new_tag('div', 'safe content'); my $tag = $dom->new_tag('div', id =>'foo', 'safe content'); my $tag = $dom->new_tag('div', data => {mojo =>'rocks'}, 'safe content'); my $tag = $dom->new_tag('div', id =>'foo', sub{ 'unsafe content' });
# " Test " $dom->parse('<b>123</b><!-- Test -->')->child_nodes->last->content;
通过选择器获取子节点:
1 2 3 4 5 6 7
# Find all child elements of this element matching the CSS selector and return a Mojo::Collection object containing these elements as Mojo::DOM objects. All selectors from "SELECTORS" in Mojo::DOM::CSS are supported.
my $collection = $dom->children; my $collection = $dom->children('div ~ p');
# Show tag name of random child element say $dom->children->shuffle->first->tag;
# Return this node's content or replace it with HTML/XML fragment (for "root" and "tag" nodes) or raw content. my $str = $dom->content; $dom = $dom->content('<p>I ♥ Mojolicious!</p>'); $dom = $dom->content(Mojo::DOM->new);
# Find all descendant elements of this element matching the CSS selector and return a Mojo::Collection object containing these elements as Mojo::DOM objects. All selectors from "SELECTORS" in Mojo::DOM::CSS are supported. my $collection = $dom->find('div ~ p'); my $collection = $dom->find('svg|line', svg =>'http://www.w3.org/2000/svg');
# Find a specific element and extract information my $id = $dom->find('div')->[23]{id};
# Extract information from multiple elements my @headers = $dom->find('h1, h2, h3')->map('text')->each;
# Count all the different tags my $hash = $dom->find('*')->reduce(sub{ $a->{$b->tag}++; $a }, {});
# Find elements with a class that contains dots my @divs = $dom->find('div.foo\.bar')->each;
查找所有兄弟元素
1 2 3 4 5 6
# Find all sibling elements after this node matching the CSS selector and return a Mojo::Collection object containing these elements as Mojo::DOM objects. All selectors from "SELECTORS" in Mojo::DOM::CSS are supported. my $collection = $dom->following; my $collection = $dom->following('div ~ p');
# List tags of sibling elements after this node say $dom->following->map('tag')->join("\n");
或:
1 2 3 4 5
# Return a Mojo::Collection object containing all sibling nodes after this node as Mojo::DOM objects. my $collection = $dom->following_nodes;
# "C" $dom->parse('<p>A</p><!-- B -->C')->at('p')->following_nodes->last->content;
获取下一个兄弟元素:
1 2 3 4 5
# Return Mojo::DOM object for next sibling element, or "undef" if there are no more siblings. my $sibling = $dom->next;
# Return Mojo::DOM object for next sibling node, or "undef" if there are no more siblings. my $sibling = $dom->next_node;
# "456" $dom->parse('<p><b>123</b><!-- Test -->456</p>') ->at('b')->next_node->next_node;
# " Test " $dom->parse('<p><b>123</b><!-- Test -->456</p>') ->at('b')->next_node->content;
获取之前的兄弟元素:
1 2 3 4 5 6
# Find all sibling elements before this node matching the CSS selector and return a Mojo::Collection object containing these elements as Mojo::DOM objects. All selectors from "SELECTORS" in Mojo::DOM::CSS are supported. my $collection = $dom->preceding; my $collection = $dom->preceding('div ~ p');
# List tags of sibling elements before this node say $dom->preceding->map('tag')->join("\n");
获取之前的兄弟 node:
1 2 3 4 5
# Return a Mojo::Collection object containing all sibling nodes before this node as Mojo::DOM objects. my $collection = $dom->preceding_nodes;
# "A" $dom->parse('A<!-- B --><p>C</p>')->at('p')->preceding_nodes->first->content;
(似乎也可以用 previous 和 previous_node)
添加元素到 HTML 之前
1 2 3 4 5 6 7 8 9 10 11
# Prepend HTML/XML fragment to this node (for all node types other than "root"). $dom = $dom->prepend('<p>I ♥ Mojolicious!</p>'); $dom = $dom->prepend(Mojo::DOM->new);
# Prepend HTML/XML fragment (for "root" and "tag" nodes) or raw content to this node's content. $dom = $dom->prepend_content('<p>I ♥ Mojolicious!</p>'); $dom = $dom->prepend_content(Mojo::DOM->new);
# Replace this node with HTML/XML fragment and return "root" (for "root" nodes) or "parent". my $parent = $dom->replace('<div>I ♥ Mojolicious!</div>'); my $parent = $dom->replace(Mojo::DOM->new);
# Check if this element matches the CSS selector. All selectors from "SELECTORS" in Mojo::DOM::CSS are supported. my $bool = $dom->matches('div ~ p'); my $bool = $dom->matches('svg|line', svg =>'http://www.w3.org/2000/svg');
# Extract value from form element (such as "button", "input", "option", "select" and "textarea"), or return "undef" if this element has no value. In the case of "select" with "multiple" attribute, find "option" elements with "selected" attribute and return an array reference with all values, or "undef" if none could be found. my $value = $dom->val;
# "a" $dom->parse('<input name=test value=a>')->at('input')->val;
# Wrap HTML/XML fragment around this node (for all node types other than "root"), placing it as the last child of the first innermost element. $dom = $dom->wrap('<div></div>'); $dom = $dom->wrap(Mojo::DOM->new);