Although pdf files are the current standard for the dissemination of scientific knowledge, the format comes with several, well known, drawbacks. An important limitation is the difficulty to re-use the data embedded in graphs and plots. Even with the advent of “enhanced” html versions of articles, data is still most often represented with images, which makes it difficult to extract the raw numbers. A few initiatives from publishers now ask researchers to submit their data along with their manuscript. But for the millions of paper already published, a number of different software solutions can help you digitize the data from plots and graphs.
Digitize your graphs and plots
All the tools presented below follow a similar process to convert bars graphs, scatter plots, and line plots into a series of numbers.
1. Open a graph
Depending on the software, the graph can be imported directly from a .pdf file, or will first have te be converted to an image format (jpg, bmp, png, gif…). The image can be obtained through the html version of the paper, or by taking a screenshot of the pdf file (on Mac use command-Shift-4; on Windows use the print screen button or by use the Snipping Tool; on Linux use the Take Screenshot application). When saving your screenshot, be aware of what file format your software accepts.
2. Set the scale
The software will ask you to define the axis and set the scale. This is how it will define the coordinates of each point. The more precise you are while doing this, the better your results will be. Most software allow for distorted axis (not perfectly perpendicular). And remember to indicate wether the graph is in log scale. (the image to the left taken from WebPlotDigitizer).
3. Digitize the data points
You then need to digitize the points or lines. Depending on the software, this step is going to be more or less automated. Most often, you are asked to, at least approximatively, indicate where the points or lines are located. Some fully manual will ask you to draw over the points or line in order to digitize the data.
4. Export the data
Finally, copy and export your data into the format that is most convenient to you. Some software include additional acquisition data analysis functionalities. But most often this is done by simply pasting a table of coordinates in your favorite data processing software.
Comparative study of graph digitizer softwares
We have put together a comparison table of 16 graph digitizer software. There might be others out there worth mentioning. Please do not hesitate to comment and we will add them to the list.
|plateform||cost||automatic detection||files supported||post aquisition analysis||year|
|Dagra: Digitize graphical data||Windows||$49.95||yes||~ all image formats||no||2012|
|DataThief||Windows, MacOS, Unix||$25||no||JPG, PNG, GIF||no||2006|
|dcsDigitiser||Windows||$423||yes||~ all image formats||yes||2015|
|DigitizeIt||Windows, MacOS, Unix||$49||yes||~ all image formats||no||2014|
|Engauge||Windows, MacOS, Unix||Free||yes||~ all image formats||no||2015|
|g3data||Windows||Free||no||~ all image formats||no||2011|
|Get Data||Windows||Free||yes||~ all image formats||no||2013|
|Graph Click||MacOS||Free||yes||~ all image formats||no||2014|
|im2graph||Windows, Linux||Free||yes||~ all image formats||no||2015|
|Graph Data Extractor||Windows||Free||no||BMP, JPG, TIF, GIF, and PNG||no||2011|
|Image J plugin||Windows, MacOS, Unix||Free||no||~ all image formats||no||2014|
|MATLAB tool (Grabit)||Windows, MacOS, Unix||Free||no||BMP, JPG, TIF, GIF, PNG||yes||2007|
|Plot Digitizer||Windows, MacOS, Unix||Free||no||JPG, PNG, GIF||no||2014|
|Un-Scan it||Windows, MacOS||$345||yes||~ all image formats||yes||2014|
|WebPlotDigitzer||Web based||Free||yes||~ all image formats||no||2014|
|WinDig Data digitizer||Windows||Free||no||BMP||no||1994|
|xyExtract Graph Digitizer||Windows||$45||no||BMP||no||2011|
So what solution is best for you? Well, as often, it depends. For most cases, using the browser-based WebPlotDigitzer will be the most convenient. It handles many types of graphs and plots, while being free. It does not require any installation, and is compatible with all platforms. You might want to consider however that because WebPlotDigitizer is a web-based tool, the current software version number is unknown, which makes it hard to reference the analysis you will have done with precision and can get in the way of reproducibility.
For the more demanding situations, Un-Scan it might help, since comes with the longest list of functionalities. It is also the most expensive solution listed here.
Also, if you are a R user, you will find tutorials online on how R can help you extract data from graphs, and a paper describing a dedicated R package developed by Timothée Poisot.
Update (30th of July 2015). I have added to the list im2graph