Don't Validate And Format In A Single Function

30th June 2018

I wanted to impart a piece of advice to do with validation and formatting of user input, although I've never seen anyone suggest it. I guess it would come under the single responsibility principle so it might seem obvious to some people. There can be reasons why this might at least seem like a good idea at the time.

Essentially, if you want to validate that something is correct, don't format it at the same time. These two actions should be done in separate functions or even classes. I hope to demonstrate that using a single function validate and format anything is a bad idea. I'll mainly be using PHP to demonstrate this, but the principle should be pretty much the same in any language.

Take the following function called isValid(). This is an arbitrary and simple example but shows validation and formatting in use in a single function.

  1. function isValid(&$string) {
  2. $string = trim($string);
  3. if (strlen($string) < 8) {
  4. return false;
  5. }
  6. $string = strtoupper($string);
  7. return true;
  8. }

The $string parameter passed to this function is passed by reference and as such anything we do to the string throughout the function changes the original string passed to it. If the string is less than 8 characters long then the function return false. Otherwise the string is uppercased and the function returns true. In either situation the string is trimmed so there is a good chance that the string will always be different when finished.

Using the above function we might implement something like this.

  1. $string = 'abcdefghijk ';
  2. if (isValid($string) == true) {
  3. echo 'string is valid';
  4. }
  5. else {
  6. echo 'string is invalid';
  7. }

After this function has completed our string is now trimmed and is in uppercase, but there is no indication that this has actually happened. This can be confusing.

I can understand why this sort of thing appears to be the right way to do this. At face value it seems to make sense that if we have validated our string then it should be formatted as well. In fact, validation code will naturally contain much of the same fine detail that formatting code does, so it can feel the right thing to do.

Unfortunately, with a single function that validates and formats the string we can get ourselves into situations where the validation function is used to format the string and not actually perform any validation. This is where confusing code like this starts to happen.

  1. $string = getEnteredString();
  2.  
  3. isValid($string);
  4.  
  5. saveStringToDatabase($string);

In this example a function called getEnteredString() is used to get some pre-validated user entered string before saving it to the database. Before that happens there is a call to isValid(), which is only used to format the string. This looks incorrect, and you would need to look closely at isValid() to figure out why it was being used in this context. People not familiar with the codebase will see this as a mistake and start to fix a non-existent bug. I appreciate that the function probably won't be called isValid(), but it is still confusing as to what this function should be doing in this context.

The best way to approach this is to remove all formatting code from the isValid() function and only use it for validation.

  1. function isValid($string) {
  2. if (strlen(trim($string)) < 8) {
  3. return false;
  4. }
  5. return true;
  6. }

Then, in a separate function we can format the string.

  1. function formatString($string) {
  2. return strtoupper(trim($string));
  3. }

The two functions can then be used like this.

  1. if (isValid($string) == true) {
  2. echo 'string is valid';
  3. $string = formatString($string);
  4. }
  5. else {
  6. echo 'string is invalid';
  7. }

In this example, if the validation passes then the string can be formatted separately.

Although it might look like we have written more code the separation of concerns here makes both testing and maintenance much easier. Being able to accurately predict the exact output of the function based on known inputs means testing can be simpler to perform.

If everything is placed into the same function then tweaking the validation becomes a lot more complex as the formatting part also needs to be taken into account and verified. I have come across this situation a few times and attempting to change a validation and formatting function becomes quite sticky. You can find yourself in the situation where you have a large single function trying to do everything and the complexity can be very difficult to understand.

Add new comment

The content of this field is kept private and will not be shown publicly.