Programming-the-Perl-DBI-Notes

第二章 基本的 Non-DBI Databases

考虑两个模块:

  • Storable
  • Data::Dumper

2.1 Storage Managers and Layers

2.2 查询语言和 Data Functions

基本的操作:

  • Fetching (select)
  • Storing (insert)
  • Updating (update)
  • Deleting (delete)

2.3 Standing Stones and the Sample Database

要接触的最简单的数据库: flat-file database

2.4 Flat-File Databases

Flat-File 数据库本质上是一个文件, 或一组文件 (其将数据组织成标准形式, 程序可以扫描, 然后获取需要的信息).

修改数据时, 实际上修改的是数据在内存中的拷贝, 最后将这组数据写回磁盘.

Flat-File database 是典型的 ASCII 文本文件, 每一行放一个 record, line termination 作为 record delimiter.

两种主要的 flat-file database:

  • 用一个 delimiter character 分割 fields 的文件
  • 用固定长度分割 fields 的文件

最常见的是用 delimiter character 分割的, 如 comma-separated value (CSV) 文件, 其用 , (逗号) 分隔 fields; 另一种常见的是用 : (冒号), tab, | 等分隔, 如 /etc/passwd 文件.

2.4.1 查询数据

判断命令行参数的代码可以写为:

1
2
3
4
5
die "Usage: scanmegadata <data file> <site name>\n" 
unless @ARGV == 2;

my $megalithFile = $ARGV[0];
my $siteName = $ARGV[1];

对于 delimiter character 的方式

存储的数据如:

1
Stonehenge:Wiltshire:SU 123 400:Stone Circle and Henge:The most famous stone circle

处理数据的代码如:

1
2
3
4
5
6
7
8
9
10
11
12
13
### Scan through all the entries for the desired site
while ( <MEGADATA> ) {
### Remove the newline that acts as a record delimiter
chop;
### Break up the record data into separate fields
( $name, $location, $mapref, $type, $description ) =
split( /:/, $_ );
### Test the sitename against the record's name
if ( $name eq $siteName ) {
$found = $.; # $. holds current line number in file
last;
}
}

这里有判断找到数据的行号的代码.

这种方式的缺陷: 若数据中含有 delimiter character 如 :, 则会产生错误的分割.

对于固定长度 fields 的方式

存储的数据如:

1
2
3
4
5
6
7
Field             Required Bytes
----- --------------
Name 64
Location 64
Map Reference 16
Type 32
Description 256

利用 unpack 来处理处理:

1
2
3
4
### Break up the record data into separate fields
### using the data sizes listed above
( $name, $location, $mapref, $type, $description ) =
unpack( "A64 A64 A16 A32 A256", $_ );

其缺点是需要确定长度. 且会使用很多无用的 space.

2.4.2 插入数据

将数据插入到一个 flat-file database 一般比较直接, 将新数据放到 data file 的末尾即可.

如:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
die "Usage: insertmegadata"
." <data file> <site name> <location> <map reference> <type> <description>\n"
unless @ARGV == 6;

my $megalithFile = $ARGV[0];
my $siteName = $ARGV[1];
my $siteLocation = $ARGV[2];
my $siteMapRef = $ARGV[3];
my $siteType = $ARGV[4];
my $siteDescription = $ARGV[5];

### Open the data file for concatenation, and die upon failure
open MEGADATA, ">>$megalithFile"
or die "Can't open $megalithFile for appending: $!\n";
### Create a new record
my $record = join( ":", $siteName, $siteLocation, $siteMapRef,
$siteType, $siteDescription );
### Insert the new record into the file
print MEGADATA "$record\n"
or die "Error writing to $megalithFile: $!\n";
### Close the megalith data file
close MEGADATA
or die "Error closing $megalithFile: $!";
print "Inserted record for $siteName\n";
exit;

利用 join 将数据和分隔符连接起来.

append 模式打开文件, 并将内容输出.

其缺点在于, 不能检测出新添加的数据是否已经存在.

对于 fixed-length datafile, 可以利用 pack() 函数来创建固定长度的 record.

2.4.3 更新数据

一个方法是将整个 database 读入内存, 在内存的 copy 中修改后写回.

另一个方法是一行一行读取 database, 修改后输出到一个临时文件, 最终替换掉原来的文件.

一般选用后者.

处理代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
while ( <MEGADATA> ) {
### Quick pre-check for maximum performance:
### Skip the record if the site name doesn't appear as a field
next unless m/^\Q$siteName:/;
### Break up the record data into separate fields
### (we let $description carry the newline for us)
my ( $name, $location, $mapref, $type, $description ) =
split( /:/, $_ );
### Skip the record if the site name doesn't match. (Redundant after the
### reliable pre-check above but kept for consistency with other examples.)
next unless $siteName eq $name;
### We've found the record to update, so update the map ref value
$mapref = $siteMapRef;
### Construct an updated record
$_ = join( ":", $name, $location, $mapref, $type, $description );
}
continue {
### Write the record out to the temporary file
print TMPMEGADATA $_
or die "Error writing $tempFile: $!\n";
}

注意 next 的使用.

这里的 while ... continue 语句, 每一次执行完 while 循环体之后, 都会执行 continue 中的代码.

对于固定长度的类型, 其处理为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
### Scan through all the records looking for the desired site
while ( <MEGADATA> ) {
### Quick pre-check for maximum performance:
### Skip the record if the site name doesn't appear at the start
next unless m/^\Q$siteName/;
### Skip the record if the extracted site name field doesn't match
next unless unpack( "A64", $_ ) eq $siteName;
### Perform in-place substitution to upate map reference field
substr( $_, 64+64, 16) = pack( "A16", $siteMapRef );
}
continue {
### Write the record out to the temporary file
print TMPMEGADATA $_
or die "Error writing $tempFile: $!\n";
}

2.4.4 删除数据

其和前面的操作类似, 只不过, 不将要删除的部分写回文件即可:

1
2
3
4
5
6
7
8
9
10
11
12
13
### Scan through all the entries for the desired site
while ( <MEGADATA> ) {
### Extract the site name (the first field) from the record
my ( $name ) = split( /:/, $_ );
### Test the sitename against the record's name
if ( $siteName eq $name ) {
### We've found the record to delete, so skip it and move to next record
next;
}
### Write the original record out to the temporary file
print TMPMEGADATA $_
or die "Error writing $tempFile: $!\n";
}

对于 fixed-length data file:

1
2
3
4
5
6
7
8
9
10
11
12
13
### Scan through all the entries for the desired site
while ( <MEGADATA> ) {
### Extract the site name (the first field) from the record
my ( $name ) = unpack( "A64", $_ );
if ( $siteName eq $name ) {
### We've found the record to delete, so skip it and move to next record
next;
}
### Write the original record out to the temporary file
print TMPMEGADATA $_
or die "Error writing $tempFile: $!\n";
}

2.5 将复杂的数据放入 Flat Files

2.5.1 Perl 的 Data::Dumper 模块

这里, 存储的数据类似于:

1
$fields=[name, location, mapref]

也就是说, 文件中存放的是 Perl 代码形式的内容.

处理如:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
use Data::Dumper;

### Scan through all the entries for the desired site
while ( <MEGADATA> ) {
### Quick pre-check for maximum performance:
### Skip the record if the site name doesn't appear
next unless m/\Q$siteName/;
### Evaluate perl record string to set $fields array reference
my $fields;
eval $_;
die if $@;
### Break up the record data into separate fields
my ( $name, $location, $mapref, $type, $description ) = @$fields;
### Skip the record if the extracted site name field doesn't match
next unless $siteName eq $name;
### We've found the record to update
### Create a new fields array with new map ref value
$fields = [ $name, $location, $siteMapRef, $type, $description ];
### Convert it into a line of perl code encoding our record string
$_ = Data::Dumper->new( [ $fields ], [ 'fields' ] )->Dump();
$_ .= "\n";
}

这里 eval $_ 的作用就是运行文件中的一行代码, 也就是赋值操作.

Data::Dumper->new( [ $fields ], [ 'fields' ] ) 这一行, 创建的内容为, 第一个参数为变量的值, 第二个参数为变量名.

Data::Dumper 产生的输出没有换行符, 可以通过设置两个包变量:

1
2
$Data::Dumper::Indent = 0; # don't use newlines to layout the output
$Data::Dumper::Useqq = 1; # use double quoted strings with "\n" escapes

可以这样写:

1
2
3
4
5
if ($ARGV[0] eq 'flat') {
$Data::Dumper::Indent = 0;
$Data::Dumper::Useqq = 1;
}
$Data::Dumper::Purity = 1;

2.5.2 Storable 模块

Storable 模块的特点是速度快. 其一般用于两个 Perl 脚本的交互, 需要将一个复杂的 Perl 数据结构传递给另一个脚本处理.

其可以将 Perl 的数据结构转换成一种字符串存储在文件中, 同样可以将这个字符串重新转换为 Perl 的数据结构.

也就是说, 存储的内容为转换为字符串的 Perl 数据结构.

主要利用两个函数:

  • freeze, 将 Perl 数据结构转换为字节数据流,以便存储到文件或通过网络传输
  • thaw, 将字节数据流转换为原始的 Perl 数据结构。

处理代码如:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
use Storable qw(freeze thaw);
use File::Slurp;
use Data::Dumper;

insert();
open my $fh, '<', '/tmp/test_storable'
or die "can not open file";

my $content = read_file('/tmp/test_storable');
my $get = thaw($content);
print Dumper($get);


sub insert {
my $popularity = 10;
my @peoples = ();
until ( $#peoples == $popularity ) {
push @peoples, { name=>rand_name(8), age=>20, hobbies=>['Linux', 'vim', 'Perl'] };
}
my $serialized_str = freeze(\@peoples);

open my $fh, '>', '/tmp/test_storable'
or die "Can not create file";

print {$fh} $serialized_str;
close $fh;
}

nfreeze() 代替 freeze() 可以保证数字的形式适合任意系统.

2.6 同时访问数据库以及 Locking

比如, 两个人同时删除 database 中的不同行, 他们都在内存中的 copy 中完成操作, 最后写回 disk, 但第一个写回的内容会被第二个写回的覆盖, 这就叫做 race condition.

一个解决方案是用系统的 file-locking mechanism. 即用 Perl 的 flock() (file lock) 函数. 可以用其来判断读取操作是否安全.

要使用 flock() 需:

1
use Fcntl ':flock';

flock() 允许两种 locking 的模式:

  • exclusive, 当一个脚本有 exclusive lock, 就只有这个脚本能 access 数据库的文件. 其他脚本想要访问需要等那个文件结束访问
  • shared (non-exclusive), 允许任意数量的文件同时访问 locked files

具体的工作其实就是请求一种类型的 lock, 如:

1
2
3
4
5
6
### Open the data file for reading, and die upon failure
open MEGADATA, $ARGV[0] or die "Can't open $ARGV[0]: $!\n";
print "Acquiring a shared lock...";
flock( MEGADATA, LOCK_SH )
or die "Unable to acquire shared lock: $!. Aborting";
print "Acquired lock. Ready to read database!\n\n";

当一个脚本请求到 exclusive lock 并开始工作时, 此时的 shared lock 请求就会失败.

当文件句柄关闭时, lock 就会被自动 release.

申请 exclusive lock 如:

1
2
3
4
5
6
7
### Open the data file for appending, and die upon failure
open MEGADATA, "+>>$ARGV[0]"
or die "Can't open $ARGV[0] for appending: $!\n";
print "Acquiring an exclusive lock...";
flock( MEGADATA, LOCK_EX )
or die "Unable to acquire exclusive lock: $!. Aborting";
print "Acquired lock. Ready to update database!\n\n";

2.7 DBM 文件以及 BerkeleyDatabase Manager

DBM files 是存储管理层, 允许将信息以 pairs of strings, a key, and a value 的形式存储在文件中.

DBM files 是二进制文件, 其存储的 strings, key 信息也是二进制.

DBM files 有几种不同的形式 (括号里为扩展):

  • ndbm (NDBM_File)
  • db (DB_File)
  • gdbm (GDBM_File)
  • sdbm (SDBM_File)
  • odbm (ODBM_File)

注意一个模块 AnyDBM_File.

这些扩展将一个磁盘上的 DBM file 和 Perl 的一个 hash variable 联系在一起, 对 hash 的 fetch values into and out of the hash 操作就像是 getting them on and off the disk.

最流行的扩展是 Berkeley Database Manager, 可以通过 Perl 的 DB_File 来访问.

2.7.1 创建一个新的 Database

形式如:

1
2
3
tie %hash,  'DB_File', $filename, $flags, $mode, $DB_HASH;
tie %hash, 'DB_File', $filename, $flags, $mode, $DB_BTREE;
tie @array, 'DB_File', $filename, $flags, $mode, $DB_RECNO;

最后一个参数用于决定 Berkeley DB 存储数据的方式:

  • DB_HASH, 默认行为, 用 key 计算出的 hash value 来存储
  • DB_BTREE, 用 balanced binary tree (B-tree) 存储. 其会对 keys 进行排序 (排序的方法可以自定义, 默认是 lexical order)
  • DB_RECNO, 其允许将键值对存储到 fixed-length 以及 variable-length 的 textual flat files, 且 key 值中包含行号 (数据库中的 record number).

filename 指的是 database file 的名称 (也就是存储数据的文件的名称).

自第三个参数开始的默认值为:

可以看出 $mode 是指明权限.

创建一个简单的 Berkeley DB 的示例:

1
2
3
4
5
6
7
8
9
10
11
#!/usr/bin/perl -w
#
# ch02/DBM/createdb: Creates a Berkeley DB
use strict;
use DB_File;

my %database;
tie %database, 'DB_File', "createdb.dat"
or die "Can't initialize database: $!\n";
untie %database;
exit;

2.7.2 Locking Strategies

flock() 处理 DBM files 的方法和 locking 普通的 Perl filehandles 有些不同. (毕竟创建 DBM file 时不会得到一个文件句柄)

使用 DB_File 模块的 fd() 来获取 DBM file 的文件描述符, 然后供 flock() 使用:

1
2
3
4
5
6
7
8
9
10
11
12
### Create the new database ...
$db = tie %database, 'DB_File', "megaliths.dat"
or die "Can't initialize database: $!\n";
### Acquire the file descriptor for the DBM file
my $fd = $db->fd();
### Do a careful open() of that descriptor to get a Perl filehandle
open DATAFILE, "+<&=$fd" or die "Can't safely open file: $!\n";
### And lock it before we start loading data ...
print "Acquiring an exclusive lock...";
flock( DATAFILE, LOCK_EX )
or die "Unable to acquire exclusive lock: $!. Aborting";
print "Acquired lock. Ready to update database!\n\n";

由于某些原因, 最好这样写:

1
2
3
use Fcntl;
# import O_EXLOCK, if available
$db = tie %database, 'DB_File', "megaliths.dat", O_EXLOCK;

2.7.3 插入以及获取值

操作一个 tied hash 即可.

如:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
#!/usr/bin/perl -w
#
# ch02/DBM/simpleinsert: Creates a Berkeley DB, inserts some test data
# and dumps it out again
use DB_File;
use Fcntl ':flock';

### Initialize the Berkeley DB
my %database;
my $db = tie %database, 'DB_File', "simpleinsert.dat",
O_CREAT | O_RDWR, 0666
or die "Can't initialize database: $!\n";

my $fd = $db->fd();
open DATAFILE, "+<&=$fd"
or die "Can't safely open file: $!\n";
print "Acquiring exclusive lock...";
flock( DATAFILE, LOCK_EX )
or die "Unable to acquire lock: $!. Aborting";
print "Acquired lock. Ready to update database!\n\n";

### Insert some data rows
$database{'Callanish I'} =
"This site, commonly known as the \"Stonehenge of the North\" is in the
form of a buckled Celtic cross.";
$database{'Avebury'} =
"Avebury is a vast, sprawling site that features, amongst other marvels,
the largest stone circle in Britain. The henge itself is so large,
it almost completely surrounds the village of Avebury.";
$database{'Lundin Links'} =
"Lundin Links is a megalithic curiosity, featuring 3 gnarled and
immensely tall monoliths arranged possibly in a 4-poster design.
Each monolith is over 5m tall.";

### Untie the database
undef $db;
untie %database;

### Close the file descriptor to release the lock
close DATAFILE;

### Retie the database to ensure we're reading the stored data
$db = tie %database, 'DB_File', "simpleinsert.dat", O_RDWR, 0444
or die "Can't initialize database: $!\n";

### Only need to lock in shared mode this time because we're not updating ...
$fd = $db->fd();
open DATAFILE, "+<&=$fd" or die "Can't safely open file: $!\n";
print "Acquiring shared lock...";
flock( DATAFILE, LOCK_SH )
or die "Unable to acquire lock: $!. Aborting";
print "Acquired lock. Ready to read database!\n\n";

### Dump the database
foreach my $key ( keys %database ) {
print "$key\n", ( "=" x ( length( $key ) + 1 ) ), "\n\n";
print "$database{$key}\n\n";
}
### Close the Berkeley DB
undef $db;
untie %database;
### Close the file descriptor to release the lock
close DATAFILE;
exit;

2.7.3.1 本地化 storage and retrieval

配合 join 使用, 存储带 delimiter character 的数据:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
### Insert some data rows
$database{'Callanish I'} =
join( ':', 'Callanish I', 'Callanish, Western Isles', 'NB 213 330',
'Stone Circle', 'Description of Callanish I' );
$database{'Avebury'} =
join( ':', 'Avebury', 'Wiltshire', 'SU 103 700',
'Stone Circle and Henge',
'Description of Avebury' );
$database{'Lundin Links'} =
join( ':', 'Lundin Links', 'Fife', 'NO 404 027', 'Standing Stones',
'Description of Lundin Links' );

### Dump the database
foreach my $key ( keys %database ) {
my ( $name, $location, $mapref, $type, $description ) =
split( /:/, $database{$key} );
print "$name\n", ( "=" x length( $name ) ), "\n\n";
print "Location: $location\n";
print "Map Reference: $mapref\n";
print "Description: $description\n\n";
}

存储 fixed-length 的数据:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
### The pack and unpack template.
$PACKFORMAT = 'A64 A64 A16 A32 A256';

### Insert some data rows
$database{'Callanish I'} =
pack( $PACKFORMAT, 'Callanish I', 'Callanish, Western Isles',
'NB 213 330', 'Stone Circle',
'Description of Callanish I' );
$database{'Avebury'} =
pack( $PACKFORMAT, 'Avebury', 'Wiltshire', 'SU 103 700',
'Stone Circle and Henge', 'Description of Avebury' );
$database{'Lundin Links'} =
pack( $PACKFORMAT, 'Lundin Links', 'Fife', 'NO 404 027',
'Standing Stones',
'Description of Lundin Links' );

### Dump the database
foreach my $key ( keys %database ) {
my ( $name, $location, $mapref, $type, $description ) =
unpack( $PACKFORMAT, $database{$key} );
print "$name\n", ( "=" x length( $name ) ), "\n\n";
print "Location: $location\n";
print "Map Reference: $mapref\n";
print "Description: $description\n\n";
}

2.7.3.2 Packing in Perl objects

这里利用面向对象的思想来创建, pack 和 unpack 数据:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
#!/usr/bin/perl -w
# ch02/DBM/Megalith.pm: A perl class encapsulating a megalith
package Megalith;
use strict;
use Carp;

### Creates a new megalith object and initializes the member fields.
sub new {
my $class = shift;
my ( $name, $location, $mapref, $type, $description ) = @_;
my $self = {};
bless $self => $class;

### If we only have one argument, assume we have a string
### containing all the field values in $name and unpack it
if ( @_ == 1 ) {
$self->unpack( $name );
}
else {
$self->{name} = $name;
$self->{location} = $location;
$self->{mapref} = $mapref;
$self->{type} = $type;
$self->{description} = $description;
}
return $self;
}

### Packs the current field values into a colon delimited record
### and returns it
sub pack {
my ( $self ) = @_;
my $record = join( ':', $self->{name}, $self->{location},
$self->{mapref}, $self->{type},
$self->{description} );
### Simple check that fields don't contain any colons
croak "Record field contains ':' delimiter character"
if $record =~ tr/:/:/ != 4;

return $record;
}

### Unpacks the given string into the member fields
sub unpack {
my ( $self, $packedString ) = @_;
### Naive split...Assumes no inter-field delimiters

my ( $name, $location, $mapref, $type, $description ) =
split( ':', $packedString, 5 );
$self->{name} = $name;
$self->{location} = $location;
$self->{mapref} = $mapref;
$self->{type} = $type;
$self->{description} = $description;
}

### Displays the megalith data
sub dump {
my ( $self ) = @_;

print "$self->{name} ( $self->{type} )\n",
"=" x ( length( $self->{name} ) +
length( $self->{type} ) + 5 ), "\n";
print "Location: $self->{location}\n";
print "Map Reference: $self->{mapref}\n";
print "Description: $self->{description}\n\n";
}
1;

使用如:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$database{'Callanish I'} =
new Megalith( 'Callanish I',
'Western Isles',
'NB 213 330',
'Stone Circle',
'Description of Callanish I' )->pack();

### Dump the database
foreach $key ( keys %database ) {
### Unpack the record into a new megalith object
my $megalith = new Megalith( $database{$key} );

### And display the record
$megalith->dump( );
}

2.7.3.3 对象访问方法

获取对象的属性, 不直接访问如:

1
print "Megalith Name: $megalith->{name}\n";

而是用一个方法, 如 getName():

1
2
3
4
5
### Returns the name of the megalith
sub getName {
my ( $self ) = @_;
return $self->{name};
}

2.7.3.4 DBM 文件和 hashtables 的查询限制

一个不太完美的直观的解决方案为, 创建里个 secondary referential hashes, 其 key 为要查询的条目, 存储的值为原 hash 的引用:

1
2
3
4
5
6
7
8
9
10
11
12
13
### Build a referential hash based on the location of each monument
$locationDatabase{'Wiltshire'} = \$database{'Avebury'};
$locationDatabase{'Western Isles'} = \$database{'Callanish I'};
$locationDatabase{'Fife'} = \$database{'Lundin Links'};

### Dump the location database
foreach $key ( keys %locationDatabase ) {
### Unpack the record into a new megalith object
my $megalith = new Megalith( ${ $locationDatabase{$key} } );

### And display the record
$megalith->dump();
}

2.7.3.5 Chaining multiple values 到一个 hash 中

用 DBM file 的 DB_HASH 模式存储的最大问题在于 keys 必须为唯一的.

可以使用 DB_BTREE 模式存储来解决, 因为其存储形式为 array, 但同样可以用 hashtables 来操作数据库. 其缺点在于性能没有 hashtable 高.

如:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
#!/usr/bin/perl -w
#
# ch02/DBM/dupkey1: Creates a Berkeley DB with the DB_BTREE mechanism and
# allows for duplicate keys. We then insert some test
# object data with duplicate keys and dump the final
# database.
use DB_File;
use Fcntl ':flock';
use Megalith;

### Set Berkeley DB BTree mode to handle duplicate keys
$DB_BTREE->{'flags'} = R_DUP;

### Remove any existing database files
unlink 'dupkey2.dat';

### Open the database up
my %database;
my $db = tie %database, 'DB_File', "dupkey2.dat",
O_CREAT | O_RDWR, 0666, $DB_BTREE
or die "Can't initialize database: $!\n";

### Exclusively lock the database to ensure no one accesses it
my $fd = $db->fd( );
open DATAFILE, "+<&=$fd"
or die "Can't safely open file: $!\n";
print "Acquiring exclusive lock...";
flock( DATAFILE, LOCK_EX )
or die "Unable to acquire lock: $!. Aborting";
print "Acquired lock. Ready to update database!\n\n";

### Create, pack and insert some rows with duplicate keys
$database{'Wiltshire'} =
new Megalith( 'Avebury',
'Wiltshire',
'SU 103 700',
'Stone Circle and Henge',
'Largest stone circle in Britain' )->pack();
$database{'Wiltshire'} =
new Megalith( 'Stonehenge',
'Wiltshire',
'SU 123 400',
'Stone Circle and Henge',
'The most popularly known stone circle in the world' )->pack();
$database{'Wiltshire'} =
new Megalith( 'The Sanctuary',
'Wiltshire',
'SU 118 680',
'Stone Circle ( destroyed )',
'No description available' )->pack();

### Dump the database
foreach my $key ( keys %database ) {
### Unpack the record into a new megalith object
my $megalith = new Megalith( $database{$key} );

### And display the record
$megalith->dump( );
}

### Close the database
undef $db;
untie %database;

### Close the filehandle to release the lock
close DATAFILE;

exit;

这种方法的问题在于访问一条 record 之下的信息, 不能利用 hash 的键来访问, 因为其是数组, 但可以利用 DB_File 模块的 seq() 函数, 其工作方式为:

这个函数有三个参数:

  • hash key
  • hash value
  • flag, 确定返回的 chain 中的值

flag 的值可以为:

  • R_FIRST, 返回 chain 中的第一个 records
  • R_LAST, 返回 chain 中的最后一个 records
  • R_NEXT, 返回 chain 中的下一个 records
  • R_PREV, 返回 chain 中的前一个 records
  • R_CURSOR, 返回部分匹配的 key

如:

1
2
3
4
5
6
7
8
9
10
11
12
### Dump the database
my ($key, $value, $status) = ('', '', 0);
for ( $status = $db->seq( $key, $value, R_FIRST );
$status == 0;
$status = $db->seq( $key, $value, R_NEXT ) ) {

### Unpack the record into a new megalith object
my $megalith = new Megalith( $value );

### And display the record
$megalith->dump();
}

2.7.4 删除值

使用 Perl 的 delete 函数即可.

如:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#!/usr/bin/perl -w
#
# ch02/DBM/delete: Creates a Berkeley DB, inserts some test data then
# deletes some of it
use strict;
use DB_File;

### Initialize the Berkeley DB
my %database;
tie %database, 'DB_File', "delete.dat"
or die "Can't initialize database: $!\n";

### Insert some data rows
$database{'Callanish I'} = "Western Isles";
$database{'Avebury'} = "Wiltshire";
$database{'Lundin Links'} = "Fife";

### Dump the database
print "Dumping the entire database...\n";
foreach my $key ( keys %database ) {
printf "%15s - %s\n", $key, $database{$key};
}
print "\n";

### Delete a row
delete $database{'Avebury'};

### Re-dump the database
print "Dumping the database after deletions...\n";
foreach my $key ( keys %database ) {
printf "%15s - %s\n", $key, $database{$key};
}

### Close the Berkeley DB
untie %database;
exit;

2.8 MLDBM 模块

MLDBM 模块用于快速将复杂的 Perl 数据结构写入 DBM files.

这个模块会自动将 Perl 数据结构序列化 (serializing)

以下代码用 DB_File 存储数据, 用 Data::Dumper 来显示数据:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#!/usr/bin/perl -w
#
# ch02/mldbmtest: Demonstrates storing complex data structures in a DBM
# file using the MLDBM module.
use MLDBM qw( DB_File Data::Dumper );
use Fcntl;

### Remove the test file in case it exists already ...
unlink 'mldbmtest.dat';

tie my %database1, 'MLDBM', 'mldbmtest.dat', O_CREAT | O_RDWR, 0666
or die "Can't initialize MLDBM file: $!\n";

### Create some megalith records in the database
%database1 = (
'Avebury' => {
name => 'Avebury',
mapref => 'SU 103 700',
location => 'Wiltshire'
},
'Ring of Brodgar' => {
name => 'Ring of Brodgar',
mapref => 'HY 294 133',
location => 'Orkney'
}
);

### Untie and retie to show data is stored in the file
untie %database1;
tie my %database2, 'MLDBM', 'mldbmtest.dat', O_RDWR, 0666
or die "Can't initialize MLDBM file: $!\n";

### Dump out via Data::Dumper what's been stored ...
print Data::Dumper->Dump( [ \%database2 ] );

untie %database2;

exit;

第四章 Programming with the DBI

4.1 DBI 架构

其分为两个主要的部分:

  • DBI
  • drivers

DBI 定义了主要的编程接口, 方法等来调用相应的 drivers.

不同的 drivers 用于不同类型的 database 然后对数据库进行实际的操作.

如:

实际的流程为:

Drivers 也被称为 database drivers, 或 DBDs, 如, Oracle 对应 DBD::Oracle, Informix 对应 DBD::Informix.

4.2 Handles

DBI 主要定义了 3 种对象, 这些对象也被称为 handles (句柄):

4.2.1 Driver Handles

Driver handles 表示加载的 drivers, 其在 DBI 加载和初始化 driver 时被创建.

一个 driver handle 就代表一个加载的 driver.

driver handle 有两个重要的方法:

  • data_sources(), 列举可以连接的东西
  • connect(), 用于实际的连接

在一般的命名惯例中, driver handle 为 $drh.

一般不会去单独创建一个 $drh 来使用, 而是直接用 DBI, 如: DBI->connect()

一个程序中可以连接多个数据库, 也就是创建多个 driver handle, 这些对象都是相互独立的.

4.2.2 Database Handles

database handle 封装了对特定数据库的一个连接, 创建方法为:

1
$dbh = DBI->connect( $data_source, ... );

DBI 脚本同样允许创建多个连接.

Database handles 是相应 driver handle 的 children.

在 DBI 的命名惯例中, database handles 一般为 $dbh

4.2.3 Statement Handles

Statement handles 用于和数据库交互以及操作. 这些 handle 一般封装单独的 SQL 语句用于运行.

Statement handles 是对应 database handle 的 children.

一个 statement handle 中的数据不会被其他 statement handle 影响.

对于一个 database handle, 并没有限制其能够创建的 statement handles 的数量.

在 DBI 的命名惯例中, statement handles 一般为 $sth

4.3 Data Source Names

通过 DBI 连接数据库时, 需要告诉 DBI 哪里找到要连接的数据库.

这类数据被称为 data source name , 其以 dbi: 开头 (和 http: 类似), 后跟 driver 的名称, 如: dbi:Oracle:, 其他数据也用 : (冒号) 分隔. 不同数据库需要的数据不同.

DBI 提供两个方法来查询可用的 data source.

DBI->available_drivers() 返回一个 available drivers 的列表 (其会通过 @INC 查找 DBD 子目录下的 .pm 文件)

使用如:

1
2
3
4
5
use My::Utils;
use Data::Dumper;

my @list = DBI->available_drivers();
showarray(\@list);

返回值如:

1
2
3
4
5
6
7
8
9
10
CSV
DBM
ExampleP
File
Gofer
Mem
Proxy
SQLite
Sponge
mysql

DBI->data_sources()DBI->available_drivers() 的一个或多个返回值作为参数, 其罗列出 which data sources are known to the driver. (一般放在 eval {} 中, 因为其会实际加载对应的 driver, 如果不能加载和初始化, 则会 die)

使用如:

1
2
my @list = DBI->data_sources("CSV");
showarray(\@list);

返回值如:

1
2
3
4
5
6
7
8
9
DBI:CSV:f_dir=lua
DBI:CSV:f_dir=c
DBI:CSV:f_dir=.git
DBI:CSV:f_dir=shell
DBI:CSV:f_dir=perl
DBI:CSV:f_dir=python
DBI:CSV:f_dir=.
DBI:CSV:f_dir=c++
DBI:CSV:f_dir=data_files

也就是给出 data source name 的示例.

常见的 data source name 示例:

$data_source (不一定叫这个名字) 为空或者未定义时, 会检查 DBI_DSN 环境变量.

4.4 连接和断开

4.4.1 Connection

连接需要的数据一般为:

  1. data source name
  2. username
  3. password

connect() 连接如:

1
$dbh = DBI->connect( $data_source, $username, $password, \%attr );

\%attr 可忽略, 其包含 handle 的一些属性, 比如 RaiseError, PrintError 等.

若连接失败则返回 undef.

示例如:

1
2
3
4
5
6
7
use DBI;
# Load the DBI module
### Perform the connection using the Oracle driver

my $dbh = DBI->connect( "dbi:Oracle:archaeo", "username", "password" )
or die "Can't connect to Oracle database: $DBI::errstr\n";
exit;

error message 存放在变量 $DBI::errstr 中.

同时连接同一个数据库两次如:

1
2
3
4
5
6
7
8
use DBI;
# Load the DBI module
### Perform the connection using the Oracle driver
my $dbh1 = DBI->connect( "dbi:Oracle:archaeo", "username", "password" )
or die "Can't make 1st database connect: $DBI::errstr\n";
my $dbh2 = DBI->connect( "dbi:Oracle:archaeo", "username", "password" )
or die "Can't make 2nd database connect: $DBI::errstr\n";
exit;

同时连接两个不同的数据库如:

1
2
3
4
5
6
7
8
use DBI;
# Load the DBI module
### Perform the connection using the Oracle driver
my $dbh1 = DBI->connect( "dbi:Oracle:archaeo", "username", "password" )
or die "Can't connect to 1st Oracle database: $DBI::errstr\n";
my $dbh2 = DBI->connect( "dbi:Oracle:seconddb", "username", "password" )
or die "Can't connect to 2nd Oracle database: $DBI::errstr\n";
exit;

上述创建的两个 database handle 都是完全独立的, 且不共享任何信息 (不管连接的是同一个数据库还是不同的)

连接两个不同类型的数据库:

1
2
3
4
5
6
7
8
9
10
11
use DBI;
# Load the DBI module
### Perform the connection using the Oracle driver
my $dbh1 = DBI->connect( "dbi:Oracle:archaeo", "username", "password" )
or die "Can't connect to Oracle database: $DBI::errstr\n";
my $dbh2 = DBI->connect( "dbi:mSQL:seconddb", "username", "password" , {
PrintError => 0
} )
or die "Can't connect to mSQL database: $DBI::errstr\n";

exit;

4.4.2 Disconnection

让一个 database handle 和其数据库 disconnect:

1
$rc = $dbh->disconnect();

成功断开返回值则为 true, 不然为 false.

具体如:

1
2
3
4
5
6
7
8
9
10
11
12
use DBI;
# Load the DBI module
### Perform the connection using the Oracle driver
my $dbh = DBI->connect( "dbi:Oracle:archaeo", "username", "password" , {
PrintError => 0
} )
or die "Can't connect to Oracle database: $DBI::errstr\n";
### Now, disconnect from the database
$dbh->disconnect
or warn "Disconnection failed: $DBI::errstr\n";

exit;

如果没有 disable AutoCommit, disconnect 则会自动 commit.

4.5 Error Handling

4.5.1 Automatic Versus Manual Error Checking

4.5.1.1 Manual error checking

默认会 enable PrintError, 其让 DBI 显示基本的 error reporting.

若要手动报告错误, 可以:

1
2
3
4
5
6
7
8
9
### Attributes to pass to DBI->connect()
%attr = (
PrintError => 0,
RaiseError => 0
);
### Connect...
my $dbh = DBI->connect( "dbi:Oracle:archaeo", "username", "password" , \%attr );
### Re-enable warning-level automatic error reporting...
$dbh->{PrintError} = 1;

大部分 DBI 方法执行失败时会返回 undef, 一种检测方法可以写为:

1
2
3
# Try connecting to a database
my $dbh = DBI->connect( ... )
or die "Can't connect to database: $DBI::errstr!\n";

全部手动的示例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#!/usr/bin/perl -w
#
# ch04/error/ex1: Small example using manual error checking.
use DBI;
# Load the DBI module
### Perform the connection using the Oracle driver
my $dbh = DBI->connect( undef, "stones", "stones", {
PrintError => 0,
RaiseError => 0
} ) or die "Can't connect to the database: $DBI::errstr\n";

### Prepare a SQL statement for execution
my $sth = $dbh->prepare( "SELECT * FROM megaliths" )
or die "Can't prepare SQL statement: $DBI::errstr\n";

### Execute the statement in the database
$sth->execute
or die "Can't execute SQL statement: $DBI::errstr\n";

### Retrieve the returned rows of data
my @row;
while ( @row = $sth->fetchrow_array( ) ) {
print "Row: @row\n";
}
warn "Data fetching terminated early by error: $DBI::errstr\n" if $DBI::err;

### Disconnect from the database
$dbh->disconnect
or warn "Error disconnecting: $DBI::errstr\n";

exit;

4.5.1.2 Automatic error checking

PrintError 让 DBI 自动调用 warn().

RaiseError 让 DBI 自动调用 die().

可以用 $SIG{__WARN__}$SIG{__DIE__} 来改变行为.

通过一个 database handle, 可以这样设置:

1
2
$h->{PrintError} = 1;
$h->{RaiseError} = 1;

PrintErrorRaiseError 被同时设置时, 如果没有设置 $SIG{__DIE__}, 那么 warn() 则会被跳过 (防止报错信息打印两次)

一个使用 RaiseError 的例子:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
use DBI;

# Load the DBI module
my ($dbh, $sth, @row);

### Perform the connection using the Oracle driver
$dbh = DBI->connect( "dbi:Oracle:archaeo", "username", "password" , {
PrintError => 0, ### Don't report errors via warn( )
RaiseError => 1 ### Do report errors via die( )
} );

### Prepare a SQL statement for execution
$sth = $dbh->prepare( "SELECT * FROM megaliths" );

### Execute the statement in the database
$sth->execute( );

### Retrieve the returned rows of data
while ( @row = $sth->fetchrow_array( ) ) {
print "Row: @row\n";
}

### Disconnect from the database
$dbh->disconnect( );

exit;

4.5.1.3 混合 error checking

让 database 崩溃时进行重连:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
#!/usr/bin/perl -w
#
# ch04/error/mixed1: Example showing mixed error checking modes.
use DBI;
# Load the DBI module
### Attributes to pass to DBI->connect() to disable automatic
### error checking
my %attr = (
PrintError => 0,
RaiseError => 0,
);
### The program runs forever and ever and ever and ever ...
while ( 1 ) {
my $dbh;

### Attempt to connect to the database. If the connection
### fails, sleep and retry until it succeeds ...
until (
$dbh = DBI->connect( "dbi:Oracle:archaeo", "username", "password" , \%attr )
) {
warn "Can't connect: $DBI::errstr. Pausing before retrying.\n";
sleep( 5 * 60 );
}

eval {
### Catch _any_ kind of failures from the code within
### Enable auto-error checking on the database handle
$dbh->{RaiseError} = 1;

### Prepare a SQL statement for execution
my $sth = $dbh->prepare( "SELECT stock, value FROM current_values" );
while (1) {
### Execute the statement in the database
$sth->execute( );
### Retrieve the returned rows of data
while ( my @row = $sth->fetchrow_array() ) {
print "Row: @row\n";
}
}

### Pause for the stock market values to move
sleep 60;
};
warn "Monitoring aborted by error: $@\n" if $@;
### Short sleep here to avoid thrashing the database
sleep 5;
}

exit;

在连接时手动处理, 其他时候自动处理 (且放在 eval 块中).

4.5.2 Error Diagnostics

DBI 定义了几个错误诊断的方法, 可以用任意 driver, database, statement handles 来调用. 通过其返回值来判断:

1
2
3
$rv = $h->err();
$str = $h->errstr();
$str = $h->state();

这几个方法返回的内容和 handle 有关 (比如不同数据库的 handle).

$h->err() 返回数字 (错误码).

$h->errstr() 返回错误码对应的字符串信息.

$h->state() 返回格式为 standard SQLSTATE five-character error 的字符串.

另外三个变量也包含同样的信息:

1
2
3
$DBI::err
$DBI::errstr
$DBI::state

使用这几个方法的示例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
use DBI;
# Load the DBI module
### Attributes to pass to DBI->connect() to disable automatic
### error checking

my %attr = (
PrintError => 0,
RaiseError => 0,
);

### Perform the connection using the Oracle driver
my $dbh = DBI->connect( "dbi:Oracle:archaeo", "username", "password" , \%attr )
or die "Can't connect to database: ", $DBI::errstr, "\n";

### Prepare a SQL statement for execution
my $sth = $dbh->prepare( "SELECT * FROM megaliths" )
or die "Can't prepare SQL statement: ", $dbh->errstr(), "\n";

### Execute the statement in the database
$sth->execute
or die "Can't execute SQL statement: ", $sth->errstr(), "\n";

### Retrieve the returned rows of data
while ( my @row = $sth->fetchrow_array() ) {
print "Row: @row\n";
}
warn "Problem in fetchrow_array(): ", $sth->errstr(), "\n"
if $sth->err();

### Disconnect from the database
$dbh->disconnect
or warn "Failed to disconnect: ", $dbh->errstr(), "\n";

exit;

4.6 Utility Methods and Functions

4.6.1 Database-Specific Quote Handling

最重要的 utility method 是 quote(), 其能够正确 quotes 和转义 SQL 语句.

将要处理的字符串传递给 quote(), 其返回值为处理后的字符串:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
use DBI;
### The string to quote
my $string = "Don't view in monochrome (it looks 'fuzzy')!";

### Connect to the database
my $dbh = DBI->connect( "dbi:Oracle:archaeo", "username", "password" , {
RaiseError => 1
} );

### Escape the string quotes ...
my $quotedString = $dbh->quote( $string );

### Use quoted string as a string literal in a SQL statement
my $sth = $dbh->prepare( "
SELECT *
FROM media
WHERE description = $quotedString
" );
$sth->execute();

exit;

如果 quote() 的参数为 undef , 则其返回 NULL 字符串.

4.6.2 Tracing DBI Execution

DBI->trace() 方法, 其会 enable tracing 所有的 DBI 操作, 其分为几个 tracing levels:

  • 0, disable tracing
  • 1, trace DBI method execution 显示返回值和错误
  • 2, 和 1 类似, 但其会包含 method entry with parameters
  • 3, 和 2 类似, 但包含更多 internal driver trace information
  • 4, 和 3 类似, 但包含更多细节

trace() 方法可以用两个参数的形式, 指明 tracing level 以及将 trace 信息存放在哪一个文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
use DBI;

### Remove any old trace files
unlink 'dbitrace.log' if -e 'dbitrace.log';

### Connect to a database
my $dbh = DBI->connect( "dbi:Oracle:archaeo", "username", "password" );

### Set the tracing level to 1 and prepare()
DBI->trace( 1 );
doPrepare();

### Set trace output to a file at level 2 and prepare()
DBI->trace( 2, 'dbitrace.log' );
doPrepare();

### Set the trace output back to STDERR at level 2 and prepare()
DBI->trace( 2, undef );
doPrepare();

exit;

### prepare a statement (invalid to demonstrate tracing)
sub doPrepare {
print "Preparing and executing statement\n";
my $sth = $dbh->prepare( "
SELECT * FROM megalith
" );
$sth->execute();
return;
}
exit;

trace() 可以仅在 handle 的层面上开启, 如在 database handle 上开启 level 1, 在一个 statement handle 上开启 level 2:

1
2
3
4
5
6
7
8
9
10
11
### Connect to a database...
my $dbh = DBI->connect( "dbi:Oracle:archaeo", "username", "password" );

### Trace the database handle to level 1 to the screen
$dbh->trace( 1 );

### Create a new statement
my $sth = ...;

### Trace the statement to level 2 to the file 'trace.lis'
$sth->trace( 2, 'trace.lis' );

tracing 也可以用环境变量 DBI_TRACE 来控制. 其和 DBI->trace() 作用相同. 其有三种设置方法:

4.6.3 Neat and Tidy Formatting

两个方法:

  • neat(), 作用在一个 scalar value
  • neat_list(), 作用在一个 list of scalar values

其可以控制输出的长度以及自动加上引号. (默认长度为 400, 其定义在 $DBI::neat_maxlen 中)

如:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
use DBI;
### Declare some strings to neatify
my $str1 = "Alligator's an extremely neat() and tidy person";
my $str2 = "Oh no\n he's not!";

### Neatify this first string to a maxlen of 40
print "String: " . DBI::neat( $str1, 40 ) . "\n";

### Neatify the second string to a default maxlen of 400
print "String: " . DBI::neat( $str2 ) . "\n";

### Neatify a number
print "Number: " . DBI::neat( 42 * 9 ) . "\n";

### Neatify an undef
print "Undef: " . DBI::neat( undef ) . "\n";

exit;

以及:

1
2
3
4
5
6
7
8
9
use DBI qw( neat_list );

### Declare some strings to neatify
my @list = ( 'String-a-string-a-string-a-string-a-string', 42, 0, '', undef );

### Neatify the strings into an array
print neat_list( \@list, 40, ", " ), "\n";

exit;

(这里的第三个参数应该是指定连接符号)

4.6.4 Numeric Testing

looks_like_number() 方法来判断一个值是否为数字.

其接受一个列表作为参数, 返回一个新列表, 其中包含对应的元素是否为数字的结果, true, falseundef.

如:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
use DBI;
### Declare a list of values
my @values = ( 333, 'Choronzon', 'Tim', undef, 'Alligator', 1234.34,
'Linda', 0x0F, '0x0F', 'Larry Wall' );

### Check to see which are numbers!
my @areNumbers = DBI::looks_like_number( @values );

for (my $i = 0; $i < @values; ++$i ) {
my $value = (defined $values[$i]) ? $values[$i] : "undef";

print "values[$i] -> $value ";

if ( defined $areNumbers[$i] ) {
if ( $areNumbers[$i] ) {
print "is a number!\n";
}
else {
print "is utterly unlike a number and should be quoted!\n";
}
}
else {
print "is undefined!\n";
}
}

exit;

第五章 和数据库交互

5.1 发出简单的查询

可以手动将数据全部 fetch, 也可以用 finish() 方法来清除.

5.1.1 准备 SQL Statements

prepare 其实就是语法检查和解析部分.

流程如:

传递给 prepare() 的 SQL 语句以字符串的形式传递给对应的 database handle, 其又将语句传递给 Database Engine 对语句进行解析, 如果语句有效 (即没有语法错误, 以及没有访问错误), 则将封装好的 statement handle 返回.

简单获取一个 statement handle:

1
2
3
4
5
6
7
8
9
use DBI;

### The database handle
my $dbh = DBI->connect( "dbi:Oracle:archaeo", "username", "password" );

### The statement handle
my $sth = $dbh->prepare( "SELECT id, name FROM megaliths" );

exit;

语句有问题则返回 undef.

对于有些 driver 来说, prepare 的步骤实际上到 execute() 时才真正进行.

5.1.1.1 构建 “on-the-fly” 的语句

如:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
### This variable is populated from the online form, somehow...
my $siteNameToQuery = $CGI->param( "SITE_NAME" );

### Take care to correctly quote it for use in an SQL statement
my $siteNameToQuery_quoted = $dbh->quote( $siteNameToQuery );

### Now interpolate the variable into the double-quoted SQL statement
$sth = $dbh->prepare( "
SELECT meg.name, st.site_type, meg.location, meg.mapref
FROM megaliths meg, site_types st
WHERE name = $siteNameToQuery_quoted
AND meg.site_type_id = st.id
" );
$sth->execute( );
@row = $sth->fetchrow_array( );
...

这里, 将从 CGI 获取的数据用 quote() 处理后内插到查询语句中.

另一种 动态生成语句 的方法:

1
2
3
4
5
6
7
8
9
### Now build the SQL statement
$statement = sprintf "SELECT %s FROM megaliths WHERE name = %s",
join(", ", @fields), $dbh->quote($siteNameToQuery);

### Perform the query
$sth = $dbh->prepare( $statement );
$sth->execute();
@row = $sth->fetchrow_array();
...

利用 sprintf(), join(), 和 quote() 配合, 根据需求查询多行.

5.1.2 运行 Select Statements

简单的如:

1
2
3
4
5
### Create the statement handle
my $sth = $dbh->prepare( "SELECT id, name FROM megaliths" );

### Execute the statement handle
$sth->execute();

若运行成功, execute() 返回 true, 失败则返回 undef.

execute() 返回成功之后, 数据库并没有实际执行 SELECT 操作, 而是设置了一个指针 to just above the first row of the table.

因此, 此时要查询的结果实际上还没有返回到 Perl 脚本, 只有当 真正 fetch 数据 时才会得到运行结果.

5.1.3 Fetching Data

从 SQL 查询得到的数据一般称为 result set (结果集, 由于关系数据库的机制). Perl 程序利用 result set 的内容需要遍历 each record 或 row. 而这种 row-by-row 获取 result set 数据的形式被称为一个 cursor .

Cursors 用于顺序获取操作, 即 records 是按照其在 result set 中的顺序获取的. 不能跳过一个 record 或者随机访问一个 record.

在一个 record 被 cursor 取出后, 无法被再次取出 (被 forgotten).

因此, 通用的取出数据的方法为: 用循环处理每一行, 直到没有行可取, 如:

1
2
3
4
5
6
while ( records to fetch from $sth ) {
### Fetch the current row from the cursor
@columns = get the column values;
### Print it out...
print "Fetched Row: @columns\n";
}

可以以多种 Perl 数据结构来取出数据, 如将一行以

  • list of values, 数组
  • a reference to an array of values, 数组引用
  • a reference to a hash of field/value pairs, 哈希引用

获取数组, 即使用 fetchrow_array 方法

最简单的方法是用 fetchrow_array(), 其返回一个数组, 包含一行中的所有字段. 也就是取一行数据.

如:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
### Prepare the SQL statement ( assuming $dbh exists )
$sth = $dbh->prepare( "
SELECT meg.name, st.site_type
FROM megaliths meg, site_types st
WHERE meg.site_type_id = st.id
" );

### Execute the SQL statement and generate a result set
$sth->execute();

### Fetch each row of result data from the database as a list
while ( ( $name, $type ) = $sth->fetchrow_array ) {
### Print out a wee message....
print "Megalithic site $name is a $type\n";
}

返回某一行中的字段也是按照存储的顺序.

( $name, $type ) = $sth->fetchrow_array 的形式也可以写为 @row = $sth->fetchrow_array.

当没有数据可以 fetch 时, fetchrow_array 会返回空列表, 此时 while 会视为 false, 然后退出循环.

获取数组引用, 即使用 fetchrow_arrayref() 方法

这个方法相对于 fetchrow_array 有性能上的优势, 其返回一个数组引用.

如:

1
2
3
4
5
6
7
### Fetch the rows of result data from the database
### as an array ref....
while ( $array_ref = $sth->fetchrow_arrayref ) {
### Print out a wee message....
print "Megalithic site $arrayref->[0] is a $array_ref->[1]\n";
}
die "Fetch failed due to $DBI::errstr" if $DBI::err;

每一次 fetch 数据时, 都使用同一个 ref, 因此以下处理无法得到想要的数据:

1
2
3
4
5
6
7
8
9
10
11
12
### The stash for rows...
my @stash;

### Fetch the row references and stash 'em!
while ( $array_ref = $sth->fetchrow_arrayref ) {
push @stash, $array_ref; # XXX WRONG!
}

### Dump the stash contents!
foreach $array_ref ( @stash ) {
print "Row: @$array_ref\n";
}

要想得到这种效果, 需要实际存储一份 copy, 如:

1
2
3
4
5
6
7
8
9
10
11
12
### The stash for rows...
my @stash;

### Fetch the row references and stash 'em!
while ( $array_ref = $sth->fetchrow_arrayref ) {
push @stash, [ @$array_ref ]; # Copy the array contents
}

### Dump the stash contents!
foreach $array_ref ( @stash ) {
print "Row: @$array_ref\n";
}

获取哈希引用, 即用 fetchrow_hashref 方法

使用如:

1
2
3
4
### Fetch rows into a hash reference
while ( $hash_ref = $sth->fetchrow_hashref ) {
print "Megalithic site $hash_ref->{name} is a $hash_ref->{site_type}\n";
}

注意 , 有些数据库会将字段名设置为 all uppercase or lowercase characters, 这种情况下就会访问失败.

可以通过传递给 fetchrow_hashref 属性值来设置, 用 NAME 表示默认情况, NAME_uc 将获取到的字段名设置为大写, NAME_lc 将获取到的字段名设置为小写, 如:

1
2
3
4
### Fetch rows into a hash reference with lowercase field names
while ( $hash_ref = $sth->fetchrow_hashref('NAME_lc') {
print "Megalithic site $hash_ref->{name} is a $hash_ref->{site_type}\n";
}

另一个需要注意的 是, 当你用 完全限定的字段名称 来查询时, 如:

1
SELECT megaliths.id ...

大多数的数据库都会只返回 id 作为字段名. 而当你有:

1
SELECT megaliths.id, media.id ...

时就会出错.

解决方法是 对列名起别名, 如:

1
SELECT megaliths.id AS meg_id, media.id AS med_id ...

5.1.3.1 fetch 和 print 的快速方法

DBI 的 dump_results() 方法能够获取并打印一个 statement handle 的 result set 中的所有行.

这个方法可以由一个 prepared and executed statement handle 来运行.

dump_results() 方法运行结束后, 其还会打印出获取到的数据的行数以及错误信息. 返回值 是获取到的行数.

使用如:

1
2
3
4
5
6
$sth = $dbh->prepare( "
SELECT name, mapref, location
FROM megaliths
" );
$sth->execute();
$rows = $sth->dump_results();

可以 自定义输出的格式 , 默认的参数为:

  • 第1个参数: Maximum Field Length - 35
  • 第2个参数: Line Separator - “\n”
  • 第3个参数: Field Separator - “,”
  • 第4个参数: Output file handle - STDOUT

设置如:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
### Prepare and execute the query
$sth = $dbh->prepare( "
SELECT name, location, mapref
FROM megaliths
" );
$sth->execute( );

### Open the output file
open FILE, ">results.lis" or die "Can't open results.lis: $!";

### Dump the formatted results to the file
$rows = $sth->dump_results( 80, '\n', ':', \*FILE );

### Close the output file
close FILE or die "Error closing result file: $!\n";

dump_results() 方法在内部使用 neat_list() 来设置格式.

5.1.4 提前完成一个 Data Fetch

也就是用 statement handle 执行 finish() 方法. (其只是 finish 掉了当前的 execution, 而不是句柄本身, 因此句柄还是可以再次 execute())

当你不需要用到一次 execution 的所有结果时, 又不想让 result set 占用一部分 temporary disk space, 就可以用 finish() 来释放这部分内存.

没有 fetch 完所有结果的 statement handle 称为 active statement handle .

5.1.5 释放 statement handles

Statement handle 也是 Perl 对象, 可以用 Perl 的 garbage collector 的机制来处理.

可以用 KidsActiveKids 这两个 database handle attributes 来追踪一个 database handle attributes 分配的 statement handle 数量.

一般利用 scope 来限制一个 statement handle 的生命周期来释放, 如:

1
2
3
4
5
if ($fetch_new_data) {
my $sth = $dbh->prepare( ... );
$sth->execute( );
$data = $sth->fetchall_arrayref();
}

if 块结束后, 这个 $sth 就会被释放.

5.2 执行 Non-SELECT Statements

这些操作如 inserting, deleting, 和 updating 数据一般不会要求返回数据.

这些操作可能都只会运行一次, 因此可以用 do() 方法, 如:

1
2
3
4
5
6
### Assuming a valid database handle exists....
### Delete the rows for Stonehenge!
$rows = $dbh->do( "
DELETE FROM megaliths
WHERE name = 'Stonehenge'
" );

其返回值 为影响到的行数, 或者 undef.

有些操作不返回行数, 成功执行则返回 -1.

如果没有影响任意行, 但也成功运行, 则返回 0E0, 表示 0.

5.3 给 Statements 绑定参数

有三种传递参数的方式:

  • placeholders
  • parameters
  • binding

绑定也需要用 placeholder, 以及 bind_param() 方法, 如:

1
2
3
4
5
6
$sth = $dbh->prepare( "
SELECT name, location
FROM megaliths
WHERE name = ?
" );
$sth->bind_param( 1, $siteName );

注意, bind_param 方法需要运行在 execute() 之前.

绑定多个值时:

1
2
3
4
5
6
7
8
9
10
$sth = $dbh->prepare( "
SELECT name, location
FROM megaliths
WHERE name = ?
AND mapref = ?
AND type LIKE ?
" );
$sth->bind_param( 1, "Avebury" );
$sth->bind_param( 2, $mapreference );
$sth->bind_param( 3, "%Stone Circle%" );

5.3.1 比较绑定和内插语句

对于有些数据库而言, 使用 bind values 可以极大提高数据库处理速度. 因为这些数据库有 Shared SQL Cache 机制, 会重复利用相似的 SQL 语句, 而不需要重新解析, 即使提供的是不同的值, 如:

1
2
3
SELECT name, location, mapref
FROM megaliths
WHERE name = <search_term>

这里提供不同的 <search_term>, 但其他部分相同, 则会被重用.

但有些情况下不支持 placeholder, 就需要用变量内插的方法.

5.3.2 绑定值和数据类型

bind_param 时指定传递数据的类型:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
use DBI qw(:sql_types);

$sth = $dbh->prepare( "
SELECT meg.name, meg.location, st.site_type, meg.mapref
FROM megaliths meg, site_types st
WHERE name = ?
AND id = ?
AND mapref = ?
AND meg.site_type_id = st.id
" );

### No need for a datatype for this value. It's a string.
$sth->bind_param( 1, "Avebury" );

### This one is obviously a number, so no type again
$sth->bind_param( 2, 21 );

### However, this one is a string but looks like a number
$sth->bind_param( 3, 123500, { TYPE => SQL_VARCHAR } );

### Alternative shorthand form of the previous statement
$sth->bind_param( 3, 123500, SQL_VARCHAR );

### All placeholders now have values bound, so we can execute
$sth->execute( );

5.3.3 绑定 Input 和 Output 参数

使用 bind_param_inout(), 其和 bind_param() 类似, 但传递的是一个引用, 而不是直接值, 其好处是可以直接得到更新后的值, 其可以传入一个参数来控制返回值的大小, 其使用如:

1
2
3
4
5
6
7
8
9
10
11
### The variables to be populated as return values...
my $ceiling;
my $floor;

$sth = $dbh->prepare( "BEGIN ceiling_floor( ?, ?, ? ); END;" );
$sth->bind_param( 1, 42.3 );
$sth->bind_param_inout( 2, \$ceiling, 50 );
$sth->bind_param_inout( 3, \$floor, 50 );
$sth->execute( );

print "Stored procedure returned $ceiling, $floor\n";

5.3.4 不使用 bind_param() 的绑定值

将值传给 execute(), 其会自动调用 bind_param() 如:

1
2
3
4
5
6
7
$sth = $dbh->prepare( "
SELECT name, location, mapref
FROM megaliths
WHERE name = ? OR description LIKE ?
" );
$sth->execute( "Avebury", "%largest stone circle%" );
...

这种情况下无法提供参数类型, 大部分情况下 driver 会做出正确的猜测.

如果在 execute() 之前调用了 bind_param() , 其参数类型会被使用 (bind_param() 提供的参数值不会被使用), 如:

1
2
3
4
5
6
7
8
$sth->prepare( "
SELECT name, location, mapref
FROM megaliths
WHERE id = ?
" );
$sth->bind_param( 1, 42, SQL_INTEGER );
$sth->execute( 123456 );
...

5.4 绑定输出列

fetch() 方法是 fetchall_arrayref() 方法的缩写.

只让 fetch() 返回指定的列, 避免额外的 copying, 使用 bind_col() 方法, 如:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
### Perl variables to store the field data in
my ( $name, $location, $type );
### Prepare and execute the SQL statement
$sth = $dbh->prepare( "
SELECT meg.name, meg.location, st.site_type
FROM megaliths meg, site_types st
WHERE meg.site_type_id = st.id
" );
$sth->execute( );

### Associate Perl variables with each output column
$sth->bind_col( 1, \$name );
$sth->bind_col( 2, \$location );
$sth->bind_col( 3, \$type );

### Fetch the data from the result set
while ( $sth->fetch ) {
print "$name is a $type located in $location\n";
}

bind_col() 的第一个参数是列数 (第几列的数据), 第二个参数是获取值的变量引用.

也可以使用 bind_columns() 方法来绑定多行:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
### Perl variables to store the field data in
my ( $name, $location, $type );

### Prepare and execute the SQL statement
$sth = $dbh->prepare( "
SELECT meg.name, meg.location, st.site_type
FROM megaliths meg, site_types st
WHERE meg.site_type_id = st.id
" );
$sth->execute( );

### Associate Perl variables with each output column
$sth->bind_columns( undef, \$name, \$location, \$type );

### Fetch the data from the result set
while ( $sth->fetch ) {
print "$name is a $type located in $location\n";
}

5.5 do() 方法和 prepare() 方法的对比

需要重复使用的 SQL 语句用 prepare() 比较高效, 如:

1
2
3
4
5
6
7
8
### Setup the statement for repeated execution
$sth = $dbh->prepare( "INSERT INTO megaliths ( name ) VALUES ( ? )" );

### Iterate through the various bits of data...
foreach $name ( qw( Stonehenge Avebury Castlerigg Sunhoney ) ) {
### ... and insert them into the table
$sth->execute( $name );
}

5.6 原子性抓取和批量获取

5.6.1 原子抓取 (Atomic Fetching)

当你要获取一行数据时, 可以用 selectrow_array()selectrow_arrayref() 方法, 其和 fetchrow_arrayfetchrow_arrayref 类似, 区别在于前者不需要单独 prepare 以及 execute, 且只返回一行数据.

selectrow_arrayselectrow_arrayref 直接通过 database handle 执行, 如:

1
2
3
4
5
### Assuming a valid $dbh exists...
( $name, $mapref ) =
$dbh->selectrow_array( "SELECT name, mapref
FROM megaliths" );
print "Megalith $name is located at $mapref\n";

5.6.2 批量抓取 (Batch Fetching)

Batch fetching 是通过一次 call 来获取 result set 中的全部数据.

通过 fetchall_arrayref()selectall_arrayref() 方法, 其用一个 preparedexecuted 的 statement handle 来调用.

可以给 fetchall_arrayref 参数来改变模式, 有三种模式:

  • 无参数
  • 数组切片的引用
  • 哈希切片的引用

5.6.2.1 无参数模式

此时返回一个数组引用, 包含 result set 中每一行数据的引用 (数组的元素都是引用).

示意图如:

示例代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#!/usr/bin/perl -w
#
# ch05/fetchall_arrayref/ex1: Complete example that connects to a database,
# executes a SQL statement, then fetches all the
# data rows out into a data structure. This
# structure is then traversed and printed.
use DBI;
### The database handle
my $dbh = DBI->connect( "dbi:Oracle:archaeo", "username", "password" , {
RaiseError => 1
});

### The statement handle
my $sth = $dbh->prepare( " SELECT name, location, mapref FROM megaliths " );

### Execute the statement
$sth->execute( );

### Fetch all the data into a Perl data structure
my $array_ref = $sth->fetchall_arrayref( );

### Traverse the data structure and dump each piece of data out
###
### For each row in the returned array reference ...
foreach my $row (@$array_ref) {
### Split the row up and print each field ...
my ( $name, $type, $location ) = @$row;
print "\tMegalithic site $name, found in $location, is a $type\n";
}
exit;

5.6.2.2 用数组切片引用作为参数

只返回所有行的特定列.

如查询中返回 5 列: name, site_type, location, mapref, 但指向要 namelocation 这 2 列.

若 SQL 如:

1
2
3
SELECT meg.name, st.site_type, meg.location, meg.mapref
FROM megaliths meg, site_types st
WHERE meg.site_type_id = st.id

返回的列和数组引用的对应关系为:

1
2
3
4
name       -> 0
site_type -> 1
location -> 2
mapref -> 3

fetchall_arrayref() 传递切片如:

1
2
3
4
5
6
7
8
### Retrieve the name and location fields...
$array_ref = $sth->fetchall_arrayref( [ 0, 2 ] );

### Retrieve the second last and last columns
$array_ref = $sth->fetchall_arrayref( [ -2, -1 ] );

### Fetch the first to third columns
$array_ref = $sth->fetchall_arrayref( [ 0 .. 2 ] );

5.6.2.3 用哈希切片引用为参数

此时的返回值同样是一个数组引用, 只不过其元素为哈希引用.

若查询为:

1
2
SELECT name, location, mapref
FROM megaliths

想要 namelocation 两列的哈希引用, 则:

1
2
### Store the name and location columns
$array_ref = $sth->fetchall_arrayref( { name => 1, location => 1 } );

一个完整的示例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#!/usr/bin/perl -w
#
# ch05/fetchall_arrayref/ex3: Complete example that connects to a database,
# executes a SQL statement, then fetches all the
# data rows out into a data structure. This
# structure is then traversed and printed.
use DBI;

### The database handle
my $dbh = DBI->connect( "dbi:Oracle:archaeo", "username", "password" , {
RaiseError => 1,
} );

### The statement handle
my $sth = $dbh->prepare( " SELECT name, location, mapref FROM megaliths " );

### Execute the statement
$sth->execute( );

### Fetch all the data into an array reference of hash references!
my $array_ref = $sth->fetchall_arrayref( { name => 1, location => 1 } );

### Traverse the data structure and dump each piece of data out
###
### For each row in the returned array reference.....
foreach my $row (@$array_ref) {
### Get the appropriate fields out the hashref and print...
print "\tMegalithic site $row->{name}, found in $row->{location}\n";
}
exit;

需要注意两点:

  • 若有同名列, 只返回一个, earlier entries 会被覆盖
  • 哈希引用中的所有键都是 lowercase

第六章 Advanced DBI

6.1 Handle 的属性和 Metadata

有些 attributes 是 database 或者 statement 独有的, 有些是共有的.

attribute values 可以视为 hash 的键值对来使用, 如下列使用 AutoCommit 属性;

1
2
3
4
5
6
7
8
### Set the database handle attribute "AutoCommit" to 1 (e.g., on)
$dbh->{AutoCommit} = 1;

### Fetch the current value of "AutoCommit" from the handle
$foo = $dbh->{AutoCommit};

### Print the current value of "AutoCommit" from the handle
print "AutoCommit: $dbh->{AutoCommit}\n";

AutoCommit 被设置为 0 时:

1
2
### Print the current value of "AutoCommit" from the handle
print "AutoCommit: $dbh->{AutoCommit}\n";

可能输出为:

1
AutoCommit:

因为此时 $dbh->{AutoCommit} 被视为空字符串来表示 false.

若访问不存在的 attributes 或者修改一个 read-only attribute, 都会调用 die 报错, 不管是否设置 RaiseError, 因此最好放在 eval { ... } 块中.

注意 , child handles 会继承一些 parent handles 的属性. 一个 statement handle 就是一个 database handle 的 child.

规则为:

  • 一个 statement handle 在被创建时会继承 parent database handle 的部分属性值
  • 如果一个 statement handle 的属性发生变化, 不会影响 parent database handle 以及其他 statement handle
  • 一个 database handle 的属性改变也不会影响以存在的 statement handle, 只会影响后续创建的 handle

6.1.1 给 DBI 方法传递属性

许多方法都接受一个 reference to a hash of attribute values 来改变其行为.

这些属性会由 DBD driver 来处理, 会被 DBI 忽视. (不是很明白)

6.1.2 Connecting with Attributes

给连接添加 attributes 的两种写法:

1
2
3
$dbh = DBI->connect( "dbi:Oracle:archaeo", "username", "password" , {
RaiseError => 1
});

或者:

1
$dbh = DBI->connect( "dbi:Oracle(RaiseError=>1):archaeo", '', '');

后者的优先级更高.

6.1.3 属性大小写的作用

属性名称的大小写特性决定了是谁定义的这个属性, 如:

  • 全 uppercase letters, 说明为 ISO SQL 或者 ODBC 定义的
  • 混合 uppercase 和 lowercase letters, 为 DBI 定义的
  • 全 lowercase letters, 由单个 database driver 定义的

6.1.4 共有属性

包括:

  • PrintError, 用 warn() 报错, 可以用 $SIG{__WARN__} 来捕获, 默认开启
  • RaiseError, 用 die() 报错, 可以用 $SIG{__DIE__} 来捕获, 其默认被关闭
  • ChopBlanks, 设置之后, 所有 SELECT 返回的 CHAR 类型行都会去掉 trailing blanks
  • LongReadLenLongTruncOk, 前者设定可获取数据的长度; 后者设置后, 当获取的数据超过 LongReadLen 的设置值, 则将其截断

6.1.5 Database Handle Attributes

6.3 Transactions, Locking 和 Isolation

SQL的transaction是指一组SQL语句的执行序列。在一个transaction中,要么所有SQL语句都被执行并且被提交(commit),这样所有的修改就会永久保存;要么所有SQL语句都被执行但是最终被回滚(rollback),这样所有的修改都不会被保存。Transaction保证了数据在修改的过程中的正确性和完整性,如果transaction执行过程中出现了问题,数据库可以通过回滚(rollback)操作将数据恢复到transaction执行前的状态。

此外,SQL的transaction还支持ACID原则来保证transaction的正确执行,ACID分别是:

  • Atomicity(原子性):transaction作为一个整体被执行,要么全部执行成功,要么全部失败回滚。
  • Consistency(一致性):transaction执行前后,数据库的完整性约束不会被破坏。
  • Isolation(隔离性):transaction执行过程中,对其他transaction和应用程序是隔离的,避免了数据并发访问冲突。
  • Durability(持久性):transaction一旦被提交,其结果就会永久保存在数据库中。

但并不是所有的数据库都支持 transaction processing.


Programming-the-Perl-DBI-Notes
http://example.com/2023/04/07/Programming-the-Perl-DBI-Notes/
作者
Jie
发布于
2023年4月7日
许可协议