Generally, the character encoding is done in various ways in PHP which provides many functions to perform these conversions, onto a given input string, from one form to another. For example, PHP urlencode()/decode() is used to convert the special characters occurred in an URL into %(Hex) format.
Another function nl2br(), which we have seen while discussing PHP line breaks, also performs conversion between actual line breaks to HTML line breaks.
Similarly, the PHP function htmlentities() is used to convert the special character that occurred in an input string into the form of HTML character entities. This kind of conversion is used to prevent the given input string containing special characters which may cause data truncation while sending them as an input of database query, URL and etc.
For example, if we have single quotes with an input string that is required to be embedded with a query, will cause a PHP error due to the incompletion of the query statement, that is truncated by the single quotes.
In such a situation, htmlentities() are used to prevent parsing special characters occurred with user input data.
Before starting with the detailed description of the PHP htmlentities() function, let us see about HTML character entities. These entities start with ampersand(&) character followed by either name or number with which these entities are specified.
On using entity number, then the HTML entity will start with (&), followed by (#) and entity number.
For each character, there will be a corresponding character entity, that holds, a unique name and number. We can specify an HTML entity by using its name like &name, for example, the character < can be specified as <.
On the other hand, with entity number specification, the same < character can be specified as <.
Now, let us have a glance at the basic syntax of PHP htmlentities() functions and some of the related PHP functions.
This function accepts four arguments as shown in the following syntax.
string htmlentities ( string $input_string, int $flag, string $character_encoding, bool $double_encode )
Now the following list describes the arguments of htmlentities().
The set of PHP constants combined together as the value of htmlentities() function’s flag parameter is listed below.
Apart from the above list of flag constants, there is a further list of remaining constant that works depending on the various types and versions of markup languages. These are,
<?php
$input_string = "PHP 'character string conversion' functions <i>htmlentities()</i>";
$output = htmlentities($input_string);
echo "<b>Original Character String</b><br/>";
echo $input_string . "<br/><br/>";
echo "<b>After Conversion</b><br/>";
echo $output;
?>
In the above program, we have an input string that includes single quotes and less than and greater than symbols which can be parsed by the browser.
So, before applying these inputs into htmlentities() function, if we print it to the browser, then, the browser will parse the HTML tags, <i></i>, and thereby displays the string htmlentities() with an italic font.
Rather, if we look into the result after applying input data to htmlentities(), then, we can see the <i></i> tags on the browser display. But still, the single quotes characters are kept as it is in the browser display since the flag constant is having ENT_COMPAT value by default which will not allow converting single quotes into HTML character entity.
The encoded string input can be reverted back to its original form by using the PHP function named html_entity_decode(). So, when we send the value of a PHP variable $output in the above program, to the html_entity_decode() function, then we can get the original form of the input string.
For example, we should add the following lines to check the output returned by the html_entity_decode() function
<?php
$decoded_output = html_entity_decode($output);
echo "<b>After Decode html entity</b><br/>";
echo $decoded_output;
?>
Like htmlentities(), PHP provides another function, named, htmlspecialchars(), which is also used for the same purpose of changing the special character into the form of HTML entities.
But the difference between htmlspecialchars() and htmlentities() is, htmlspecialchars() can convert a limited set of special characters, that is, less than (<), greater than (>), single quotes (’), double quotes (”) and ampersand (&), into their corresponding HTML entities, whereas, htmlentities() functions will convert all special character into its entity form.
For example, let us examine these two methods, with the following PHP program to convert the input string that includes the copyright symbol (©).
<?php
$input_string = "HTML © symbol";
echo "htmlspecialchars() returns<br/><br/>";
echo $output = htmlspecialchars($input_string) . "<br/><br/>";
echo "htmlentities() returns<br/><br/>";
echo $output = htmlentities($input_string);
?>
While executing the above program, we can see with the source view of browser output, how the htmlentities() function convert the © symbol as ©, and the htmlspecialchars() replace this character as ©.