NYCPHP Meetup

NYPHP.org

[nycphp-talk] filter_input misconceptions

Gary Mort garyamort at gmail.com
Thu May 22 16:44:23 EDT 2014


It seems there are some misconceptions on the filter_* API. Recently I 
was contacted by a colleague when his website went off kilter.   All of 
the sudden all the variables had extra html encoding charectors in 
them....and then since they were encoded a second time when displayed 
they would have even more.

This was on a server I had worked on a few months previously and this 
was not happening.  So I took a look at the configuration and discovered 
that filter_default had been changed.  It turned out that PHP on the 
server had recently been upgraded from the CentOS repositories and the 
default settings changed.
http://us3.php.net/manual/en/filter.configuration.php#ini.filter.default

Looking into it a bit more, I experiemented with some different options 
on various settings which control the creation of the super global 
variables, after which I decided that it is better to use $var
  = filter_input(INPUT_GET, 'myvar', FILTER_UNSAFE_RAW); then $_GET in 
the future[though of course if in a framework which provides an 
interface to the http variables it is better to use the framework to be 
consistent]
Note, FILTER_UNSAFE_RAW - this is not a security decision, it is a 
stability decision.

First off, if you use $_GET then you ALSO are using the filter_input 
API.   All global variables are populated by passing them through 
filter_input.  By design, the default filter will be FILTER_UNSAFE_RAW 
however there is no way to change this from within your PHP code.  It 
can only be set before the execution of the PHP script[either in the 
php.ini file or, with apache, you can set a custom ini variable in 
.htaccess].

Presuming my colleague gave me the correct information on where the 
upgrade came from, it seems that the latest CentOS PHP packages instead 
use FILTER_SANITIZE_FULL_SPECIAL_CHARS

Since you[and by you I mean anyone publishing PHP code where they can't 
control the server configuration it will be executed on] can't get away 
from it being used, the best you can do is force it to do exactly what 
you want.  IE if you want raw data, use  filter_input and 
FILTER_UNSAFE_RAW so you make sure to get what you expect to get, and 
not something set by the server.

In addition, the global variables $_GET, $_SERVER, $_ENV, $_SESSION, 
$_COOKIE, $_REQUEST, and $_POST simply can't be trusted.  Only 
filter_input will give you access to the true data for 4 out of those 7 
variables.  It does not give you access to $_SESSION and while it does 
give you access to INPUT_POST but there are a couple edge cases where it 
will not provide the post data.

The php.ini settings filter.default, track-vars, and variables-order can 
all change what is stored in the super globals.
http://us3.php.net/manual/en/filter.configuration.php#ini.filter.default
http://us3.php.net/manual/en/ini.core.php#ini.track-vars
http://us3.php.net/manual/en/ini.core.php#ini.variables-order

Without doing detailed checking of the various combinations from inside 
the code, there is no way to tell which of those variables actually has 
data and what filtering has already been done to them. In addition, 
$_SERVER may or may not include both $_SERVER and $_ENV 
variables[running google app engine's dev server they will be combined, 
when you are on the actual production server they are not]

With auto-globals-jit the $_SERVER and $_ENV variables may or may not 
even be available.
http://us3.php.net/manual/en/ini.core.php#ini.auto-globals-jit

Thanks to request-order, the order of variables in $_REQUEST can be 
anything - so if I want a specific combination of possibilities, I 
retrieve that combination rather then hope that request-order was not 
changed from the default 'GPC'
http://us3.php.net/manual/en/ini.core.php#ini.request-order

$_POST is an even more interesting. If you disable $_POST - either by 
setting variable_order to 'EGCS' to exclude posted data then 
filter_input will not return any data for posted variables. 
auto_glabls_jit provides an even odder edge case.  With just in time 
creation enables, $_POST will not be created when PHP is running unless 
you try to access something from the array.   This also affects whatever 
internal structure filter_input() uses and filter_input() does not 
trigger the creation of post variables.  So if you call filter_input() 
after calling something like isset($_POST['anynonexistantvariable'] it 
works.  If you call it before it does not.

To safely deal with post you need to parse the post data from 
php://input .. and for php://input is inconsistent in that depending on 
compile options it may be possible to read it multiple times, or it may 
be deleted after being read.






More information about the talk mailing list