Scatter plots – An Application of the GD Library

Published: December 16th, 2008 by:

Easily the most useful application of the GD library is the ability to dynamically output clean, sharp graphs and charts.  A scatter plot, the plotting of points based on an independent variable and the dependent variable (usually X and Y), is a great application that uses several common GD functions and is useful for data analysis as well.  In this post, I'll highlight those functions while explaining the math behind using PHP to create a great scatter plot.


Because this series has been written to focus mainly on the capabilities of the GD library, I wish to stray from focusing too heavily on the mathematical aspects of creating a scatter plot, as interesting as they are in this case.  Perhaps I will comment on it in the future, but for now, let us focus on the main PHP functions available to work with graphing.

Whether you use a form to input your x and y-values or obtain them from the URL parameters, the data should be handled like two lists, one containing all of the x-values and the other with the corresponding y-values in the order in which they pair up with the x-values.  Believe it or not, there are only two new functions since the introductory GD library post that need to be explained, and they will be detailed now.

The first is the function to draw a line, appropriately called in the following manner:


<?php

imageline($image,$x1,$y1,$x2,$y2,$color);

?>

This function is used to draw the two axes, starting with the point ($x1,$y1) extending to the point ($x2,$y2).  The color should be defined previously using “imagecolorallocate,” as well as the image itself using any of the available functions.  While straight integer values can be used as the coordinates, they can be obtained from calculations, either called by variables or through direct computation.  This capability can make for dynamic line placement and drawing.

Let’s apply this function to our main example, the scatter plot.  The main use of lines in a scatter plot is in the x and y-axes.  The two axes should be placed relative to the minimum x and y values obtained from the two lists.  To create the two axes, let’s do the following:


<?php

...

$Xmin = 0;
$Xmax = 0;
$Ymin = 0;
$Ymax = 0;
foreach ($x as $val) {
if ($val<$Xmin) { $Xmin = $val; }
if ($val>$Xmax) { $Xmax = $val; }
}
foreach ($y as $val) {
if ($val<$Ymin) { $Ymin = $val; }
if ($val>$Ymax) { $Ymax = $val; }
}
$Xmin -= 1;
$Xmax += 1;
$Ymin -= 1;
$Ymax += 1;
$Yscl = (500/($Ymax-$Ymin));
$Xscl = (500/($Xmax-$Xmin));
imageline($graph,0,round(499-($Yscl*($Ymin*-1))),500,round(499-($Yscl*($Ymin*-1))),$axes);
imageline($graph,round($Xscl*($Xmin*-1)),0,round($Xscl*($Xmin*-1)),500,$axes);

...

?>

One thing to realize with creating graphs with the GD library is that images do not work like Cartesian coordinate planes.  With respect to images, the top-left corner is (0,0) and the bottom-right corner is (max,max), so complex calculations are necessary to compute the proper coordinates.

Now for the meat of the graph: the points.  The second new function is used to draw the points themselves, used in the following manner:


<?php

imagefilledellipse($graph,$cx,$cy,$width,$height,$color);

?>

Surely you can see that the function is used to draw an ellipse, but if the width and height are equal, a circle would be created.  In the creation of a scatter plot, the center coordinates would be the representation of the x and y-values, whereas the width and height would be based either on the thickness desired or, to extend the functionality of the graph, on the frequency of that data.  I suggest making it as complicated as you need; points of the same size work just fine.

To add the points to the graph, we add the following code, utilizing the aforementioned function:


<?php

...

for ($c=0,$z=count($x); $c<$z; $c++) {
$xPt = round($Xscl*($x[$c]+($Xmin*-1)));
$yPt = round((500-($Yscl*($y[$c]+($Ymin*-1)))));
imagefilledellipse($graph,$xPt,$yPt,7,7,$point);
}

...

?>

The for() loop is used here so we can run through the x and y arrays at the same time to get the corresponding coordinates, and then the imagefilledellipse() function draws the points on.  The width and height do not have to be 7; choose an appropriate size based on your need.

Lastly, we use the imageline() function again to draw on the line of best fit for the data, and this part is definitely optional, but for statisticians it is typically the most important.  The code is as follows, calculating the beginning and ending y values and then drawing the line:


<?php

...

$Yone = round((500-($Yscl*((($Xmin*$a)+$b)+($Ymin*-1)))));
$Ytwo = round((500-($Yscl*((($Xmax*$a)+$b)+($Ymin*-1)))));
imageline($graph,0,$Yone,500,$Ytwo,$bestFit);

...

?>

To create a great looking scatter plot, put the code all together to make something like this:


<?php

header("Content-type: image/png");
$x = //Array of x values
$y = //Array of y values
$a = round(((($n*$sumXY) - ($sumX*$sumY))/(($n*$sumXSq)-pow($sumX,2))),3);

$b = round(((($sumY*$sumXSq)-($sumX*$sumXY))/(($n*$sumXSq)-pow($sumX,2))),3)
$graph = imagecreatetruecolor(500,500);
$gray = imagecolorallocate($graph,250,250,250);
$axes = imagecolorallocate($graph,200,200,200);
$bestFit = imagecolorallocate($graph,0,0,255);
$point = imagecolorallocate($graph,255,0,0);
imagefill($graph,0,0,$gray);
$Xmin = 0;
$Xmax = 0;
$Ymin = 0;
$Ymax = 0;
foreach ($x as $val) {
if ($val$Xmax) { $Xmax = $val; }
}
foreach ($y as $val) {
if ($val$Ymax) { $Ymax = $val; }
}
$Xmin -= 1;
$Xmax += 1;
$Ymin -= 1;
$Ymax += 1;
$Yscl = (500/($Ymax-$Ymin));
$Xscl = (500/($Xmax-$Xmin));
imageline($graph,0,round(499-($Yscl*($Ymin*-1))),500,round(499-($Yscl*($Ymin*-1))),$axes);
imageline($graph,round($Xscl*($Xmin*-1)),0,round($Xscl*($Xmin*-1)),500,$axes);
for ($c=0,$z=count($x); $c<$z; $c++) {
$xPt = round($Xscl*($x[$c]+($Xmin*-1)));
$yPt = round((500-($Yscl*($y[$c]+($Ymin*-1)))));
imagefilledellipse($graph,$xPt,$yPt,7,7,$point);
}
$Yone = round((500-($Yscl*((($Xmin*$a)+$b)+($Ymin*-1)))));
$Ytwo = round((500-($Yscl*((($Xmax*$a)+$b)+($Ymin*-1)))));
imageline($graph,0,$Yone,500,$Ytwo,$bestFit);
header("Content-type: image/png");
imagepng($graph);
imagedestroy($graph);
?>

As you can see, I added a few lines to allow spacing around the plot and changed a few colors, but the main idea is the same.  To check out a working example of the scatter plot, head over to the fully-functioning linear regression model and input two lists.  Focus on the picture at the bottom of the middle column; that’s the great looking scatter plot we just created!

The two functions discussed in this article have a lot of potential extending to other applications as well, including pie graphs and even graphing equations of lines.  The GD library, through mathematical analysis and application, can become quite a tool for mathematicians/statisticians for dynamically outputting sharp-looking representations.


One Response to “Scatter plots – An Application of the GD Library”

  • Chris

    Kurtis,
    I noticed a few minor errors in the above code, on lines 20 and 23 (missing > and < signs) and one missing ;.... though I still can't get a test page to work.  I'm a mid level php programmer, and still get a broken image ... could you repost the updated working source code so that I can retry? tx Chris

     

Leave a Reply





Wordpress doesn't like it when you post PHP code. Go save your code at pastebin, and post the link here.

About the Author

Kurtis has been working with PHP for nearly four years, and he has moderate experience with MySQL as well as other programming languages, like Java and C++.