Many people who need a duplex ADF scanner come across the Fujitsu ScanSnap S1500 and so did I. After giving it some thought it ended up being my preferable choice, altough the successor ScanSnap iX500 was already available. This decision was mostly due to the fact that there were reports that indicated SANE-support for S1500, which was unclear for the iX500.
My demands on an ADF scanner setup basically were:
- It should work together with CentOS 6.
- There should be a one-button setup that saves the scanned A4 pages to a predefined Samba share.
- Optimally it should autodetect whether it needs to be run in duplex mode.
- Automatic OCR embedded in the pdf as an optional nice feature.
Regarding the CentOS 6 setup one can say, that the scanner works out of the box. The only minus point is that, as has been indicated here, one has to pass the “-B” option to scanimage in order to fix some I/O errors in case of colored duplex mode.
For the one-button setup I used scanbuttond. There seems to be a “successor” scanbd, but I didn’t get it working. As scanbuttond is not provided in any CentOS repository you have to compile it on your own: I downloaded the latest 0.2.3 version, which apparently does not have support for the ScanSnap S1500 since the project is orphaned. Fortunately the Debian project provides a set of patches, where one of them adds support for the scanner.
So in principle one now could write a working one-button-to-pdf script. The missing part with automatically deciding on whether to scan in duplex mode is more tricky and it ended up with a small script I found here, which does some sort of auto white page removal. At the moment it is a viable solution, though not a very fast one.
The last point with the automatic OCR is still missing since CentOS does not come with any decent OCR in the repositories. In one way or another it will probably be some combination of tesseract / Ocropus / cuneiform with hocr2pdf, but this is still under investigation.
In the end the script that gets called within buttonpressed.sh is the following:
#!/bin/bash CURDIR=`pwd` TMPDIR=`mktemp -d` OUT_DIR=/sambashares/ cd $TMPDIR echo "Starting Scan:" echo "==============" echo "" scanimage -b -B --resolution 150 --batch=scan_%03d.tiff --format=tiff \ --mode Color --device-name "fujitsu:ScanSnap S1500:111111" \ -x 210 -y 297 --brightness +10 \ --page-width 210 --page-height 297 \ --sleeptimer 1 --source "ADF Duplex" echo "" echo "Checking for blank pages:" echo "=========================" echo "" if [ -f "scan_001.tiff" ]; then for i in scan_*.tiff; do histogram=`convert "${i}" -threshold 50% -format %c histogram:info:-` white=`echo "${histogram}" | grep "white" | sed -n 's/^ *\(.*\):.*$/\1/p'` black=`echo "${histogram}" | grep "black" | sed -n 's/^ *\(.*\):.*$/\1/p'` blank=`echo "scale=4; ${black}/${white} < 0.005" | bc` echo `ls -lisah $i` if [ ${blank} -eq "1" ]; then echo "${i} seems to be blank - removing it..." rm "${i}" fi done OUTPUTNAME=scan_`date +%Y%m%d-%H%M%S`.pdf tiffcp -c lzw scan_*.tiff allscans.tiff tiff2pdf -z -p A4 allscans.tiff > out.pdf gs -q -dNOPAUSE -dBATCH -dSAFER \ -sDEVICE=pdfwrite \ -dCompatibilityLevel=1.3 \ -dPDFSETTINGS=/screen \ -dEmbedAllFonts=true \ -dSubsetFonts=true \ -dColorImageDownsampleType=/Bicubic \ -dColorImageResolution=300 \ -dGrayImageDownsampleType=/Bicubic \ -dGrayImageResolution=300 \ -dMonoImageDownsampleType=/Bicubic \ -dMonoImageResolution=300 \ -sOutputFile=$OUTPUTNAME \ out.pdf cp $OUTPUTNAME $OUT_DIR/$OUTPUTNAME chown smbuser:smbuser $OUT_DIR/$OUTPUTNAME fi cd $CURDIR rm -rf ${TMPDIR} |
Just a few last remarks on the script:
- You have to get the scaner ID from “scanimage -L” and replace it accordingly in the scanimage call.
- Using the jpeg compression feature of tiff2pdf gives the picture a red color cast, which I cannot explain so far.
- In order to achieve a better compression ratio I also added the ghostscript call.