Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Advanced PHP Programming

.pdf
Скачиваний:
67
Добавлен:
14.04.2015
Размер:
7.82 Mб
Скачать

198 Chapter 7 Managing the Development Environment

One solution to specifying which class to use is to simply hard-code it in a file and keep different versions of that file in production and development. Keeping two copies is highly prone to error, though, especially when you’re executing merges between branches. A much better solution is to have the database library itself automatically detect whether it is running on the staging server or the production server, as follows:

switch($_SERVER[HTTP_HOST]) { case www.example.com:

class DB_Wrapper extends DB_Mysql_Prod {}

break;

case stage.example.com:

class DB_Wrapper extends DB_Mysql_Prod {}

break;

case dev.example.com:

class DB_Wrapper extends DB_Mysql_Test {}

default:

class DB_Wrapper extends DB_Mysql_Localhost {}

}

Now you simply need to use DB_Wrapper wherever you would specify a database by name, and the library itself will choose the correct implementation. You could alternatively incorporate this logic into a factory method for creating database access objects.

You might have noticed a flaw in this system: Because the code in the live environment is a particular point-in-time snapshot of the PROD branch, it can be difficult to revert to a previous consistent version without knowing the exact time it was committed and pushed.These are two possible solutions to this problem:

nYou can create a separate branch for every production push.

nYou can use symbolic tags to manage production pushes.

The former option is very common in the realm of shrink-wrapped software, where version releases occur relatively infrequently and may need to have different changes applied to different versions of the code. In this scheme, whenever the stage environment is ready to go live, a new branch (for example, VERSION_1_0_0) is created based on that point-in-time image.That version can then evolve independently from the main staging branch PROD, allowing bug fixes to be implemented in differing ways in that version and in the main tree.

I find this system largely unworkable for Web applications for a couple reasons:

nFor better or for worse,Web applications often change rapidly, and CVS does not scale to support hundreds of branches well.

nBecause you are not distributing your Web application code to others, there is much less concern with being able to apply different changes to different versions. Because you control all the dependent code, there is seldom more than one version of a library being used at one time.

Managing Packaging

199

The other solution is to use symbolic tags to mark releases. As discussed earlier in this chapter, in the section “Symbolic Tags,” using a symbolic tag is really just a way to assign a single marker to a collection of files in CVS. It associates a name with the then-current version of all the specified files, which in a nonbranching tree is a perfect way to take a snapshot of the repository. Symbolic tags are relatively inexpensive in CVS, so there is no problem with having hundreds of them. For regular updates of Web sites, I usually name my tags by the date on which they are made, so in one of my projects, the tag might be PROD_2004_01_23_01, signifying Tag 1 on January 23, 2004. More meaningful names are also useful if you are associating them with particular events, such as a new product launch.

Using symbolic tags works well if you do a production push once or twice a week. If your production environment requires more frequent code updates on a regular basis, you should consider doing the following:

nMoving content-only changes into a separate content management system (CMS) so that they are kept separate from code. Content often needs to be updated fre-

quently, but the underlying code should be more stable than the content.

nCoordinating your development environment to consolidate syncs. Pushing code live too frequently makes it harder to effectively assure the quality of changes, which increases the frequency of production errors, which requires more frequent production pushes to fix, ad infinitum.This is largely a matter of discipline:There are few environments where code pushes cannot be restricted to at most once per day, if not once per week.

Note

One of the rules that I try to get clients to agree to is no production pushes after 3 p.m. and no pushes at all on Friday. Bugs will inevitably be present in code, and pushing code at the end of the day or before a weekend is an invitation to find a critical bug just as your developers have left the office. Daytime pushes mean that any unexpected errors can be tackled by a fresh set of developers who aren’t watching the clock, trying to figure out if they are going to get dinner on time.

Managing Packaging

Now that you have used change control systems to master your development cycle, you need to be able to distribute your production code.This book is not focused on producing commercially distributed code, so when I say that code needs to be distributed, I’m talking about the production code being moved from your development environment to the live servers that are actually serving the code.

Packaging is an essential step in ensuring that what is live in production is what is supposed to be live in production. I have seen many people opt to manually push changed files out to their Web servers on an individual basis.That is a recipe for failure.

200Chapter 7 Managing the Development Environment

These are just two of the things that can go wrong:

nIt is very easy to lose track of what files you need to copy for a product launch. Debugging a missing include is usually easy, but debugging a non-updated include can be devilishly hard.

nIn a multiserver environment, things get more complicated.There the list expands.

For example, if a single server is down, how do you ensure that it will receive all the incremental changes it needs when it is time to back up? Even if all your machines stay up 100% of the time, human error makes it extremely easy to have subtle inconsistencies between machines.

Packaging is important not only for your PHP code but for the versions of all the support software you use as well. At a previous job I ran a large (around 100) machine PHP server cluster that served a number of applications. Between PHP 4.0.2 and 4.0.3, there was a slight change in the semantics of pack().This broke some core authentication routines on the site that caused some significant and embarrassing downtime. Bugs happen, but a sitewide show-stopper like this should have been detected and addressed before it ever hit production.The following factors made this difficult to diagnose:

nNobody read the 4.0.3 change log, so at first PHP itself was not even considered as a possible alternative.

nPHP versions across the cluster were inconsistent. Some were running 4.0.1, others 4.0.2, still others 4.0.3.We did not have centralized logging running at that point, so it was extremely difficult to associate the errors with a specific machine.They appeared to be completely sporadic.

Like many problems, though, the factors that led to this one were really just symptoms of larger systemic problems.These were the real issues:

nWe had no system for ensuring that Apache, PHP, and all supporting libraries were identical on all the production machines. As machines became repurposed, or as different administrators installed software on them, each developed its own personality. Production machines should not have personalities.

nAlthough we had separate trees for development and production code, we did not have a staging environment where we could make sure that the code we were about to run live would work on the production systems. Of course, without a solid system for making sure your systems are all identical, a staging environment is only marginally useful.

nNot tracking PHP upgrades in the same system as code changes made it difficult to correlate a break to a PHP upgrade.We wasted hours trying to track the problem to a code change. If the fact that PHP had just been upgraded on some of the machines the day before had been logged (preferably in the same change control system as our source code), the bug hunt would have gone much faster.

Managing Packaging

201

Solving the pack() Problem

We also took the entirely wrong route in solving our problem with pack(). Instead of fixing our code so that it would be safe across all versions, we chose to undo the semantics change in pack() itself (in the PHP source code). At the time, that seemed like a good idea: It kept us from having to clutter our code with special cases and preserved backward compatibility.

In the end, we could not have made a worse choice. By “fixing” the PHP source code, we had doomed ourselves to backporting that change any time we needed to do an upgrade of PHP. If the patch was forgotten, the authentication errors would mysteriously reoccur.

Unless you have a group of people dedicated to maintaining core infrastructure technologies in your company, you should stay away from making semantics-breaking changes in PHP on your live site.

Packaging and Pushing Code

Pushing code from a staging environment to a production environment isn’t hard.The most difficult part is versioning your releases, as you learned to do in the previous section by using CVS tags and branches.What’s left is mainly finding an efficient means of physically moving your files from staging to production.

There is one nuance to moving PHP files. PHP parses every file it needs to execute on every request.This has a number of deleterious effects on performance (which you will learn more about in Chapter 9,“External Performance Tunings”) and also makes it rather unsafe to change files in a running PHP instance.The problem is simple: If you have a file index.php that includes a library, such as the following:

#index.php

<?php

require_once hello.inc; hello();

?>

#hello.inc

<?php

function hello() { print Hello World\n;

}

?>

and then you change both of these files as follows:

# index.php

<?php

require_once hello.inc; hello(George);

?>

# hello.inc

202 Chapter 7 Managing the Development Environment

<?php

function hello($name) { print Hello $name\n;

}

?>

if someone is requesting index.php just as the content push ensues, so that index.php is parsed before the push is complete and hello.inc is parsed after the push is complete, you will get an error because the prototypes will not match for a split second.

This is true in the best-case scenario where the pushed content is all updated instantaneously. If the push itself takes a few seconds or minutes to complete, a similar inconsistency can exist for that entire time period.

The best solution to this problem is to do the following:

1.Make sure your push method is quick.

2.Shut down your Web server during the period when the files are actually being updated.

The second step may seem drastic, but it is necessary if returning a page-in-error is never acceptable. If that is the case, you should probably be running a cluster of redundant machines and employ the no-downtime syncing methods detailed at the end of Chapter 15, “Building a Distributed Environment.”

Note

Chapter 9 also describes compiler caches that prevent reparsing of PHP files. All the compiler caches have built-in facilities to determine whether files have changed and to reparse them. This means that they suffer from the inconsistent include problem as well.

There are a few choices for moving code between staging and production:

ntar and ftp/scp

nPEAR package format

ncvs update

nrsync

nNFS

Using tar is a classic option, and it’s simple as well.You can simply use tar to create an archive of your code, copy that file to the destination server, and unpack it. Using tar archives is a fine way to distribute software to remote sites (for example, if you are releasing or selling an application).There are two problems with using tar as the packaging tool in a Web environment, though:

nIt alters files in place, which means you may experience momentarily corrupted reads for files larger than a disk block.

nIt does not perform partial updates, so every push rewrites the entire code tree.

Managing Packaging

203

An interesting alternative to using tar for distributing applications is to use the PEAR package format.This does not address either of the problems with tar, but it does allow users to install and manage your package with the PEAR installer.The major benefit of using the PEAR package format is that it makes installation a snap (as you’ve seen in all the PEAR examples throughout this book). Details on using the PEAR installer are available at http://pear.php.net.

A tempting strategy for distributing code to Web servers is to have a CVS checkout on your production Web servers and use cvs update to update your checkout.This method addresses both of the problems with tar: It only transfers incremental changes, and it uses temporary files and atomic move operations to avoid the problem of updating files in place.The problem with using CVS to update production Web servers directly is that it requires the CVS metadata to be present on the destination system.You need to use Web server access controls to limit access to those files.

A better strategy is to use rsync. rsync is specifically designed to efficiently synchronize differences between directory trees, transfers only incremental changes, and uses temporary files to guarantee atomic file replacement. rsync also supports a robust limiting syntax, allowing you to add or remove classes of files from the data to be synchronized.This means that even if the source tree for the data is a CVS working directory, all the CVS metadata files can be omitted for the sync.

Another popular method for distributing files to multiple servers is to serve them over NFS. NFS is very convenient for guaranteeing that all servers instantaneously get copies of updated files. Under low to moderate traffic, this method stands up quite well, but under higher throughput it can suffer from the latency inherent in NFS.The problem is that, as discussed earlier, PHP parses every file it runs, every time it executes it. This means that it can do significant disk I/O when reading its source files.When these files are served over NFS, the latency and traffic will add up. Using a compiler cache can seriously minimize this problem.

A technique that I’ve used in the past to avoid overstressing NFS servers is to combine a couple of the methods we’ve just discussed. All my servers NFS-mount their code but do not directly access the NFS-mounted copy. Instead, each server uses rsync to copy the NFS-mounted files onto a local filesystem (preferably a memory-based filesystem such as Linux’s tmpfs or ramfs). A magic semaphore file is updated only when content is to be synced, and the script that runs rsync uses the changing timestamp on that file to know it should actually synchronize the directory trees.This is used to keep rsync from constantly running, which would be stressful to the NFS server.

Packaging Binaries

If you run a multiserver installation, you should also package all the software needed to run your application.This is an often-overlooked facet of PHP application management, especially in environments that have evolved from a single-machine setup.

Allowing divergent machine setups may seem benign. Most of the time your applications will run fine.The problems arise only occasionally, but they are insidious. No one

204 Chapter 7 Managing the Development Environment

suspects that the occasional failure on a site is due to a differing kernel version or to an Apache module being compiled as a shared object on one system and being statically linked on another—but stranger things happen.

When packaging my system binaries, I almost always use the native packaging format for the operating system I am running on.You can use tar archives or a master server image that can be transferred to hosts with rsync, but neither method incorporates the ease of use and manageability of Red Hat’s rpm or FreeBSD’s pkg format. In this section I use the term RPM loosely to refer to a packaged piece of software. If you prefer a different format, you can perform a mental substitution; none of the discussions are particular to the RPM format itself.

I recommend not using monolithic packages.You should keep a separate package for PHP, for Apache, and for any other major application you use. I find that this provides a bit more flexibility when you’re putting together a new server cluster.

The real value in using your system’s packaging system is that it is easy to guarantee that you are running identical software on every machine. I’ve used tar() archives to distribute binaries before.They worked okay.The problem was that it was very easy to forget which exact tar ball I had installed.Worse still were the places where we installed everything from source on every machine. Despite intentional efforts to keep everything consistent, there were subtle differences across all the machines. In a large environment, that heterogeneity is unacceptable.

Packaging Apache

In general, the binaries in my Apache builds are standard across most machines I run. I like having Apache modules (including mod_php) be shared objects because I find the plug-and-play functionality that this provides extremely valuable. I also think that the performance penalty of running Apache modules as shared objects is completely exaggerated. I’ve never been able to reproduce any meaningful difference on production code.

Because I’m a bit of an Apache hacker, I often bundle some custom modules that are not distributed with Apache itself.These include things like mod_backhand, mod_log_spread, and some customized versions of other modules. I recommend two Web server RPMs. One contains the Web server itself (minus the configuration file), built with mod_so, and with all the standard modules built as shared objects. A second RPM contains all the custom modules I use that aren’t distributed with the core of Apache. By separating these out, you can easily upgrade your Apache installation without having to track down and rebuild all your nonstandard modules, and vice versa.This is because the Apache Group does an excellent job of ensuring binary compatibility between versions.You usually do not need to rebuild your dynamically loadable modules when upgrading Apache.

With Apache built out in such a modular fashion, the configuration file is critical to make it perform the tasks that you want. Because the Apache server builds are generic

Managing Packaging

205

and individual services are specific, you will want to package your configuration separately from your binaries. Because Apache is a critical part of my applications, I store my httpd.conf files in the same CVS repository as my application code and copy them into place. One rule of thumb for crafting sound Apache configurations is to use generic language in your configurations. A commonly overlooked feature of Apache configuration is that you can use locally resolvable hostnames instead of IP literals in your configuration file.This means that if every Web server needs to have the following configuration line:

Listen 10.0.0.N:8000

where N is different on every server, instead of hand editing the httpd.conf file of every server manually, you can use a consistent alias in the /etc/hosts file of every server to label such addresses. For example, you can set an externalether alias in every host via the following:

10.0.0.1 externalether

Then you can render your httpd.conf Listen line as follows:

Listen externalether:8000

Because machine IP addresses should change less frequently than their Web server configurations, using aliases allows you to keep every httpd.conf file in a cluster of servers identical. Identical is good.

Also, you should not include modules you don’t need. Remember that you are crafting a configuration file for a particular service. If that service does not need mod_rewrite, do not load mod_rewrite.

Packaging PHP

The packaging rules for handling mod_php and any dependent libraries it has are similar to the Apache guidelines. Make a single master distribution that reflects the features and build requirements that every machine you run needs.Then bundle additional packages that provide custom or nonstandard functionality.

Remember that you can also load PHP extensions dynamically by building them shared and loading them with the following php.ini line:

extension = my_extension.so

An interesting (and oft-overlooked) configuration feature in PHP is config-dir support. If you build a PHP installation with the configure option

--with-config-file-scan-dir, as shown here:

./configure [ options ] --with-config-file-scan-dir=/path/to/configdir

then at startup, after your main php.ini file is parsed, PHP will scan the specified directory and automatically load any files that end with the extension .ini (in alphabetical order). In practical terms, this means that if you have standard configurations that go with an extension, you can write a config file specifically for that extension and bundle

206 Chapter 7 Managing the Development Environment

it with the extension itself.This provides an extremely easy way of keeping extension configuration with its extension and not scattered throughout the environment.

Multiple ini Values

Keys can be repeated multiple times in a php.ini file, but the last seen key/value pair will be the one

used.

Further Reading

Additional documentation on CVS can be found here:

nThe main CVS project site, http://www.cvshome.org, has an abundance of information on using and developing with CVS. The Cederqvist, an online manual for

CVS that is found on the site, is an excellent introductory tutorial.

nOpen Source Development with CVS by Moshe Bar and Karl Fogelis is a fine book on developing with CVS.

nThe authoritative source for building packages with RPM is available on the Red Hat site, at http://rpm.redhat.com/RPM-HOWTO. If you’re running a different operating system, check out its documentation for details on how to build native packages.

nrsync’s options are detailed in your system’s man pages. More detailed examples and implementations are available at the rsync home page:

http://samba.anu.edu.au/rsync.

8

Designing a Good API

WHAT MAKES SOME CODE GOODAND OTHER code “bad”? If a piece of code functions properly and has no bugs, isn’t it good? Personally, I don’t think so. Almost no code exists in a vacuum. It will live on past its original application, and any gauge of quality must take that into account.

In my definition, good code must embody qualities like the following:

nIt is easy to maintain.

nIt is easy to reuse in other contexts.

nIt has minimal external dependencies.

nIt is adaptable to new problems.

nIts behavior is safe and predictable.

This list can be further distilled into the following three categories:

nIt must be refactorable.

nIt must be extensible.

nIt must be written defensively.

Bottom-Up Versus Top-Down Design

Design is essential in software development. The subject of software design is both broad and deep, and I can hardly scratch the surface in this chapter. Fortunately, there are a number of good texts in the field, two of which are mentioned in the “Further Reading” section at the end of this chapter.

In the broadest generality, design can be broken into two categories: top-down and bottom-up.

Bottom-up design is characterized by writing code early in the design process. Basic low-level components are identified, and implementation begins on them; they are tied together as they are completed.

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]