May | 2013 | /dev/urandom thoughts

Many people who need a duplex ADF scanner come across the Fujitsu ScanSnap S1500 and so did I. After giving it some thought it ended up being my preferable choice, altough the successor ScanSnap iX500 was already available. This decision was mostly due to the fact that there were reports that indicated SANE-support for S1500, which was unclear for the iX500.

My demands on an ADF scanner setup basically were:

It should work together with CentOS 6.
There should be a one-button setup that saves the scanned A4 pages to a predefined Samba share.
Optimally it should autodetect whether it needs to be run in duplex mode.
Automatic OCR embedded in the pdf as an optional nice feature.

Regarding the CentOS 6 setup one can say, that the scanner works out of the box. The only minus point is that, as has been indicated here, one has to pass the “-B” option to scanimage in order to fix some I/O errors in case of colored duplex mode.

For the one-button setup I used scanbuttond. There seems to be a “successor” scanbd, but I didn’t get it working. As scanbuttond is not provided in any CentOS repository you have to compile it on your own: I downloaded the latest 0.2.3 version, which apparently does not have support for the ScanSnap S1500 since the project is orphaned. Fortunately the Debian project provides a set of patches, where one of them adds support for the scanner.

So in principle one now could write a working one-button-to-pdf script. The missing part with automatically deciding on whether to scan in duplex mode is more tricky and it ended up with a small script I found here, which does some sort of auto white page removal. At the moment it is a viable solution, though not a very fast one.

The last point with the automatic OCR is still missing since CentOS does not come with any decent OCR in the repositories. In one way or another it will probably be some combination of tesseract / Ocropus / cuneiform with hocr2pdf, but this is still under investigation.

In the end the script that gets called within buttonpressed.sh is the following:

#!/bin/bash
 
CURDIR=`pwd`
TMPDIR=`mktemp -d`
OUT_DIR=/sambashares/
 
cd $TMPDIR
 
echo "Starting Scan:"
echo "=============="
echo ""
 
scanimage -b -B --resolution 150 --batch=scan_%03d.tiff --format=tiff \
	--mode Color --device-name "fujitsu:ScanSnap S1500:111111" \
	-x 210 -y 297 --brightness +10 \
	--page-width 210 --page-height 297 \
	--sleeptimer 1 --source "ADF Duplex"
 
echo ""
echo "Checking for blank pages:"
echo "========================="
echo ""
 
if [ -f "scan_001.tiff" ]; then
 
for i in scan_*.tiff; do
  histogram=`convert "${i}" -threshold 50% -format %c histogram:info:-`
  white=`echo "${histogram}" | grep "white" | sed -n 's/^ *\(.*\):.*$/\1/p'`
  black=`echo "${histogram}" | grep "black" | sed -n 's/^ *\(.*\):.*$/\1/p'`
  blank=`echo "scale=4; ${black}/${white} < 0.005" | bc`
  echo `ls -lisah $i`
  if [ ${blank} -eq "1" ]; then
    echo "${i} seems to be blank - removing it..."
    rm "${i}"
  fi
done
 
OUTPUTNAME=scan_`date +%Y%m%d-%H%M%S`.pdf
 
tiffcp -c lzw scan_*.tiff allscans.tiff
tiff2pdf -z -p A4 allscans.tiff > out.pdf
gs      -q -dNOPAUSE -dBATCH -dSAFER \
        -sDEVICE=pdfwrite \
        -dCompatibilityLevel=1.3 \
        -dPDFSETTINGS=/screen \
        -dEmbedAllFonts=true \
        -dSubsetFonts=true \
        -dColorImageDownsampleType=/Bicubic \
        -dColorImageResolution=300 \
        -dGrayImageDownsampleType=/Bicubic \
        -dGrayImageResolution=300 \
        -dMonoImageDownsampleType=/Bicubic \
        -dMonoImageResolution=300 \
        -sOutputFile=$OUTPUTNAME \
        out.pdf
 
cp $OUTPUTNAME $OUT_DIR/$OUTPUTNAME
 
chown smbuser:smbuser $OUT_DIR/$OUTPUTNAME
 
fi
 
cd $CURDIR
 
rm -rf ${TMPDIR}

Just a few last remarks on the script:

You have to get the scaner ID from “scanimage -L” and replace it accordingly in the scanimage call.
Using the jpeg compression feature of tiff2pdf gives the picture a red color cast, which I cannot explain so far.
In order to achieve a better compression ratio I also added the ghostscript call.

Summing up I’m so far content with the feature the scanner provides and from hardware side there is only the point that it would have been nice to have a second button, e.g. for producing direct copies. Regarding the script there are still things I might want to try, most notably the automatic OCR, but in theory there is also support for color correction in SANE (but this is really low priority).

/dev/urandom thoughts

Just another random blog

Monthly Archives: May 2013

Running a Fujitsu ScanSnap S1500 on a CentOS 6 Machine