Thursday, 14 March 2013

Cache is King

One of the thing that I have noticed again and again is that most developers will write fantastically complex solutions. Then when the system starts to come under heavy load they then start to look at what tricks they can implement so that they can relieve the pressure on their servers.

Mostly you will see one or more locations where caching of some form or other is implemented into pages so what can you cache.  Normally you see Dynamic caching implemented.  Dynamic caching is when you cache the results of your page for say 30 minutes or an hour or even only for 5 minutes depending on your need for each page.  With Dynamic caching we still have the problem that when the age of the file is marked as too old we rebuild the complete page with the new content.  The problem is that on most web sites 90%+ of the content could and should be cached as Static content.

The difference between static and dynamic caching is that the static is cached only when the data changes so for example the menu of a web site is can be cached as Static information.  Even if you need to cache out many versions of the static content.

So that covers Static and Dynamic but what other types of caching is there?  Data caching when you cache the some data normally the results of say a recordset or it could be as simple as the result of a  complex function.

So why would you choose between static and dynamic caching surely they are the same thing?  well no with Dynamic caching you are interested in the age of the cache.  So you have to stat the file to see the age of the file before comparing it to the current timestamp.  This is more expensive than the Static Cache which doesn't care about the age of a file it just loads it every single time some other process will recache the files when they are required to be cached.

By using a mix of these three types of caching you can make pages that load fast and efficiently.  All content on a page can be considered a rectangle.  You the developer decide if the content in that rectangle is something that changes often (Dynamic & Static) or something that changes infrequently (Static)  but what about Data caching?

Lets say that your webpage is displaying a list of news articles well the ID's of the latest 10 articles could be cached using the Data caching so that when you are recaching the page via Dynamic caching you load the ids then loop through that small data set and load the static cached news articles.

Notice that in this example the number of database calls for this page has reduced to zero.

Let me introduce you to the PHP Programmers project Edify

https://github.com/phpprogrammers/Edify

In this project you will find a section for Caching with a factory class that take a driver for the type of caching that you want to do.

There are three types of caching drivers Dynamic, Statics & Requires.  As you can probably tell the requires is the data caching this is because we cache the data inside PHP tags therefore to load it the fastest way is to require the file.  This means that we do not have to convert from text to data through some complex means.

Saturday, 9 March 2013

Why you should use Echo rather than Print and how

PHP developers have had two different functions that were always in my mind the same function.  I never really thought about which one was the better to use. I just always preferred to use the print function as I had always associated echo with a command line command so preferred the print syntax to it.  I dont know if I ever used echo in my last job (6 years) but in my new job we were drawing up our programming standards documentation.  Because we have inherited a code base thats over 10 years old.  As part of that we were discussing the usage of echo for printing content.  It turns out that the echo function can take a number of parameters. So what does that really mean.

First you need to understand a few points about concatenation.  First when you concat two strings together you end up with 3 areas of memory being used to store the information with 3 strings you get 5 seperate areas of memory allocated. Lets use an example


$hello = "hello";
$world = "world";
print  $hello . " " . $world;


Ok so both strings are 5 bytes long and when we concat them together we also put them together with a space which makes our final print statement take an area of allocated memory taking up 11 bytes of memory but and heres the kicker there is also the creation of a 1 byte string and a 6 byte string which are both held in memory.  So lets break this down and explain where they come from.

If the variable of $hello put the string at memory positions 1-5 it would use the 5 bytes in a row.  Then the creation of the second variable $world would be in memory positions 6-10 and would use the next 5 bytes of available memory (malloc function) .  Now for the creation of final string that is printed well lets take the first part of the statement  5 bytes + 1byte = a new 6 byte block of data  then we merge with the 5 bytes of the variable $world so now we have the following allocated


  1. 5 bytes holding the string "hello"
  2. 5 bytes holding the string "world" (total 10 bytes allocated)
  3. 1 byte holding the space (total 11 bytes allocated)
  4. 6 bytes holding the string "hello " (total 17 bytes allocated)
  5. 11 bytes holding the string "hello world" (total 28 bytes allocated)


Its important to know that while we have only allocated 28 bytes of information which is nothing with todays memory but think about doing this with more than two variables where your variables are hundreds of bytes long if not thousands of bytes long and if you have 10,000 visitors hitting the page then just for the memory allocation of the hello world example would use 280,000 bytes of memory.   If each of the  two variable held 100 bytes then the memory usage for 10,000 visitors at the same time would be 100+100+1+101+201 = 503 bytes * 10000 = 5,030,000 bytes  as you can see you are suddenly talking about a lot of memory allocation going on.

Garbage collection only kicks in once the page finishes executing the print statement and your memory usage will drop to the 10 bytes allocated by the defined variables as nothing is now using (pointing to) the information allocated at memory position 11 - 28.
If you just change the print statement to an echo statement you would get no benefit as you would still have to allocate the same memory for the function to be able to output the final string.

But if you REMEMBER that I said that the echo can take a list of parameters.

It turns out you can just replace the period (concat) with a comma so the code


print  $hello . " " . $world;
changes to
echo  $hello , " " , $world


With this the script has to allocate a single extra byte of information into memory (the space) and the echo  just outputs each string its given this means that it only allocates 11 bytes of information with our 10,000 visitors we are allocates 110,000 bytes and with our 100 byte strings we are now only allocating 100+100+1 = 201 bytes * 10,000 = 2,010,000.  As you can see memory allocation has decreased significantly.

Remember that php is not just a web scripting language but can be run as a command line program as well.

Its important for new PHPers to know that php is a scripting language that is used predominately in web development but that you can also write command line programs in PHP as well. You must understand how both of these processes work. A web-server is typically a dumb terminal. You ask it for a web page and it knows how to server the contents of a file. It doesn't know anything about the internals of the file but rather it just knows how to open the two types of files that exist and how to serve them those file types are Text files and binary files. So how does a php file get processed? actually this is quite simple the web-server is still as dumb in the regard of serving two types of files but now it checks the extension of the file in question and passes it to an external program that will return a stream of data which could be binary or text based. 

OK so now our web-server knows how to serve a text file, a binary file and how to serve a data stream form an external program. (note if you created an hello world program in any language you could configure a web-server to call that program when any file with an extension of .hello was requested.)

There are different restrictions when you are working with PHP in these two environments. The first environment is in web development. Most PHP development will be done using a Web Server to serve the page. The Second is when you want to run a script on the command line.

WEBSERVER:::
  1. Most web servers will limit your page execution to a max of 30 seconds. Typically you will have scripts that will execute in 1-3 seconds with 5-10 seconds considered long execution times. The 30 second barrier imposed by most web servers is to stop infinity loops killing your web-server These are typically a web-server setting for all scripting languages not a php specific setting. To lengthen the time you need to reconfigure your web-server. After the time limit the web-server will kill the process. 
  2. A web-server might sandbox the php process and limit what it can do. 
  3. Session information a web-server tracks user requests and ip addresses allowing you to have persistant information across page requests. 
  4. The web-server will tell php extra settings that will appear in the $_SERVER variable that are related to the machine that is requesting the page. 
COMMAND LINE:::
  1. There is not timeout on your scripts as they are now considered to have full reign on the machine based on the permissions of the user that is executing the script. 
  2. The session environment does not exist as there is only a single instance. 
  3. You can write a php script to parse a huge data file if you wanted but perl would probably do that job faster and quicker for you. 
I've often needed to batch process a lot of files and have used php to loop through a directory and create me a bat file that will run each set of commands on each file as needed. I'm sure there were probably beter ways of doing it but none quicker. As I knew PHP and knew I could use the directory functions to build the bat file to execute really quickly

Thursday, 7 March 2013

Authenticating users using an intranet on a public website.


I have in the past had to authenticate members of staff on a website that is in an external facing server.

First things to know is that its possible for a server to have more than one network card. For the example that we are going to discuss we will state that NIC#1 is used for visitors to access the server and NIC#2 is the card that people inside the company’s network would travel through if visiting the site. Since we know that staff will be on a internal network IP Range 10.*.*.* we can use the php server variables to detect the visitors IP we can work out that they are a member of staff and let them in but wait thats probably rather insecure. As a hacker could probably fake the IP address.

So how do we allow staff to login with single sign on (SSO) ie the login account they used to logon to their computer. Turns out its quite simple.

You need a webserver that is inside the corporate network to have a page that requires user authentication NTLM this will specify that a user has been authenticated via a trusted system.

So you have now got two servers

Server A : in an external location (DMZ)
Server B : a corporate server located in local LAN (Intranet)

All you need to do is redirect a user from Server A to Server B and have Server B then redirect back to Server A. Server B can then send your Authentication details (username) to the Website on Server A.

Now for added security you should have a password on server A that is used to encrypt the information that is sent to server B then have server B decrypt in the authentication page and then encrypt the information you will send back to server A

This means that if a hacker tried to trick the website on server A that they have authenticated correctly then they would have to find the encryption key on A, The Encryption Key on B to work out what information is being transmitted to truly authenticate a user.