[nycphp-talk] filter_input misconceptions
Gary Mort
garyamort at gmail.com
Thu May 22 16:44:23 EDT 2014
It seems there are some misconceptions on the filter_* API. Recently I
was contacted by a colleague when his website went off kilter. All of
the sudden all the variables had extra html encoding charectors in
them....and then since they were encoded a second time when displayed
they would have even more.
This was on a server I had worked on a few months previously and this
was not happening. So I took a look at the configuration and discovered
that filter_default had been changed. It turned out that PHP on the
server had recently been upgraded from the CentOS repositories and the
default settings changed.
http://us3.php.net/manual/en/filter.configuration.php#ini.filter.default
Looking into it a bit more, I experiemented with some different options
on various settings which control the creation of the super global
variables, after which I decided that it is better to use $var
= filter_input(INPUT_GET, 'myvar', FILTER_UNSAFE_RAW); then $_GET in
the future[though of course if in a framework which provides an
interface to the http variables it is better to use the framework to be
consistent]
Note, FILTER_UNSAFE_RAW - this is not a security decision, it is a
stability decision.
First off, if you use $_GET then you ALSO are using the filter_input
API. All global variables are populated by passing them through
filter_input. By design, the default filter will be FILTER_UNSAFE_RAW
however there is no way to change this from within your PHP code. It
can only be set before the execution of the PHP script[either in the
php.ini file or, with apache, you can set a custom ini variable in
.htaccess].
Presuming my colleague gave me the correct information on where the
upgrade came from, it seems that the latest CentOS PHP packages instead
use FILTER_SANITIZE_FULL_SPECIAL_CHARS
Since you[and by you I mean anyone publishing PHP code where they can't
control the server configuration it will be executed on] can't get away
from it being used, the best you can do is force it to do exactly what
you want. IE if you want raw data, use filter_input and
FILTER_UNSAFE_RAW so you make sure to get what you expect to get, and
not something set by the server.
In addition, the global variables $_GET, $_SERVER, $_ENV, $_SESSION,
$_COOKIE, $_REQUEST, and $_POST simply can't be trusted. Only
filter_input will give you access to the true data for 4 out of those 7
variables. It does not give you access to $_SESSION and while it does
give you access to INPUT_POST but there are a couple edge cases where it
will not provide the post data.
The php.ini settings filter.default, track-vars, and variables-order can
all change what is stored in the super globals.
http://us3.php.net/manual/en/filter.configuration.php#ini.filter.default
http://us3.php.net/manual/en/ini.core.php#ini.track-vars
http://us3.php.net/manual/en/ini.core.php#ini.variables-order
Without doing detailed checking of the various combinations from inside
the code, there is no way to tell which of those variables actually has
data and what filtering has already been done to them. In addition,
$_SERVER may or may not include both $_SERVER and $_ENV
variables[running google app engine's dev server they will be combined,
when you are on the actual production server they are not]
With auto-globals-jit the $_SERVER and $_ENV variables may or may not
even be available.
http://us3.php.net/manual/en/ini.core.php#ini.auto-globals-jit
Thanks to request-order, the order of variables in $_REQUEST can be
anything - so if I want a specific combination of possibilities, I
retrieve that combination rather then hope that request-order was not
changed from the default 'GPC'
http://us3.php.net/manual/en/ini.core.php#ini.request-order
$_POST is an even more interesting. If you disable $_POST - either by
setting variable_order to 'EGCS' to exclude posted data then
filter_input will not return any data for posted variables.
auto_glabls_jit provides an even odder edge case. With just in time
creation enables, $_POST will not be created when PHP is running unless
you try to access something from the array. This also affects whatever
internal structure filter_input() uses and filter_input() does not
trigger the creation of post variables. So if you call filter_input()
after calling something like isset($_POST['anynonexistantvariable'] it
works. If you call it before it does not.
To safely deal with post you need to parse the post data from
php://input .. and for php://input is inconsistent in that depending on
compile options it may be possible to read it multiple times, or it may
be deleted after being read.
More information about the talk
mailing list