PHP – Object References Vs. Object Copies (Pass by reference or pass by copy)
An important concept in programming is pointers. Even though we don’t directly use them in most modern languages the way we used to (think C/C++) they still come back to haunt us. In many modern languages they are referred to as references. References aren’t quite pointers, but they do resemble them enough to illustrate the concept (note that C/C++ has both pointers and references!).
What is the difference between a pointer and a reference? A pointer can be typed or non typed. A pointer can point to an arbitrary location in memory and invalid pointers (pointing to the wrong spot) is one of the causes of segmentation faults and blue screens in microsoft windows in the past. The pointer is very versatile, but also very dangerous.
A reference is similar to a pointer, but not quite. Think of a reference as a pointer to an existing varaible (not just some arbitrary pointer). See how it is similar? Some people get confused easily here.
In our case, we are only going to talk about references and how they work in PHP. References are important because they can affect the speed and functionality of a script. When a function in PHP is called, the parameters supplied are generally passed by copying the information. This means that if you change data passed into a function, when you leave the function the data will not have changed. Look at this example.
<?php
$a = "Testing...";
print "Variable = $a...<BR>";
function ChangeString($str) {
$str = "String chanaged...";
print ">>> Local variable = $str...<BR>";
}
ChangeString($a);
print "Variable = $a...<BR>";
?>
This code shows the following output.
Variable = Testing......
>>> Local variable = String chanaged......
Variable = Testing......
Notice that the function modifies the string passed to it, but once we are done with the function and return to the main script, the varaible is untouched. This is because when we pass the variable to the function, it actually makes a new copy. This copy is also set so that is applicable only within the scope of the function (local scope). Some might think that this difference in scope is the cause of the copy. That is not the case. While scope is an important concept, it is not the reason the variable doesn’t get changed, the fact that it is copied is. Lets illustrate the point with some more examples.
<?php
$a = "Testing...";
print "Variable = $a...<BR>";
function ChangeString(&$str) {
$str = "String chanaged...";
print ">>> Local variable = $str...<BR>";
}
ChangeString($a);
print "Variable = $a...<BR>";
?>
This is the SAME script, with just one minor change. In the definition of the function, we add the & character. This character means get a reference to the passed variable. In essense, we are now pointing to the original variable instead of making a copy. Here is the output now.
Variable = Testing......
>>> Local variable = String chanaged......
Variable = String chanaged......
Since we are now pointing to the original variable rather than making the copy, if we change either one, they both change since they both point to the same memory. Here is the same example without the function call. Here we will use simple variables to demonstrate the same issue. Note that both demonstrations (copying and referencing) are demonstrated in the same script; most of the script consists of print statements in order to show you whats going on.
<?php
print "The copy example...<BR>";
$copy_a = "A value";
print ">>> A = $copy_a...<BR>";
$copy_b = $copy_a;
print ">>> B = $copy_b...<BR>";
$copy_b = "B Value";
print ">>>>> After changing copy_b...<BR>";
print ">>> A = $copy_a...<BR>";
print ">>> B = $copy_b...<BR>";
print "<BR>----------------------------------<BR>";
print "The reference example...<BR>";
$ref_a = "A value";
print ">>> A = $ref_a...<BR>";
$ref_b =& $ref_a;
print ">>> B = $ref_b...<BR>";
$ref_b = "B value";
print ">>>>> After changing ref_b...<BR>";
print ">>> A = $ref_a...<BR>";
print ">>> B = $ref_b...<BR>";
?>
The output looks like this:
The copy example...
>>> A = A value...
>>> B = A value...
>>>>> After changing copy_b...
>>> A = A value...
>>> B = B Value...
----------------------------------
The reference example...
>>> A = A value...
>>> B = A value...
>>>>> After changing ref_b...
>>> A = B value...
>>> B = B value...
When simply assigning a variable to another variable, you are actually copying that value. When you use the & sign, you are pointing the second variable to the first so that they both point to the same value or memory location. This means that changing one will change the other.
Some languages copy everything unless specifically told to reference. Others copy base data types (integer, float, string, etc..) and automatically reference object types (meaning you can assign variables to your objects like you are copying, but it will in the background reference it for you). Generally you just get used to how your language works and use it accordingly.
It looks like PHP changed how they do things between versions 4 and 5. In version 4, you have to explicitly reference everything if you don’t want to make copies. In PHP 5, it is smart enough to automatically pass references when you are working with objects. This created quite a headache when we tried to run a script written for PHP 5 on a server that had PHP 4 installed. The script had to be basically rewritten and lots of & symbols introduced. Why did PHP change things you ask? Because generally you want objects passed by reference. When adding the extra object oriented support in PHP 5, they decided to conform with most other languages in this respect. It was a good choice, especially since most people don’t have to worry about running new scripts on old servers. All old scripts that specifically use references should still work in the newer PHP versions.
Earlier I mentioned performance. Everytime you copy a value, a new chunk of memory has to be allocated. For base types (integers, etc..) this isn’t as big a deal. For Objects that may have lots of data, this can create a significant ammount of overhead. If you need a copy, then there is no getting around it, but some times you can spee things up quite a bit by using references.
On one project I was working on, a string variable held the whole contents of the current page (think 20-30K!). We sent this string to a processing function. At first, we used the normal method of passing in a copy, and then setting the string to the returned value:
$str = "lots of data";
$str = ProcessString($str);
function ProcessString($s) {
$s .= " more data";
return $s;
}
This worked fine. The data from the $str variable was copied when passed to the function. The function would process the string, then pass back the processed string. Again, when passing or returning the resulting string, a copy is made again. So, our $str variable will hold a copy of a copy when we are done (or even a copy of a copy of a copy depending of if copies are made inside of the processing function!). You can see how this will add up.
By using a reference, you can modify the variable that is passed in directly. You ONLY want to do this when it is ok to modify the variable directly (not all the time!). By saving the copying where appropriate, you can speed up your code. The same code using a reference would look like the following. Notice that you don’t need to return a value now since the value being passed can be directly modified.
$str = "lots of data";
ProcessString($str);
function ProcessString(&$s) {
$s .= " more data";
// return $s;
}
Ray Pulsipher
Owner
Computer Magic And Software Design