Commit e46cbff3 authored by Matt Rickard's avatar Matt Rickard Committed by Neal Wu

Revert merge #1292

This broke the bazel build of inception/download_and_preprocess_flowers
The way that this script is written doesn't actually allow it to be ran
outside bazel, some refactoring would be needed if you want to run it
standalone.

It should be ran using

```
bazel build inception/download_and_preprocess_flowers

bazel-bin/inception/download_and_preprocess_flowers
"${FLOWERS_DATA_DIR}"
```
parent 405bb623
...@@ -198,7 +198,7 @@ def _process_image(filename, coder): ...@@ -198,7 +198,7 @@ def _process_image(filename, coder):
width: integer, image width in pixels. width: integer, image width in pixels.
""" """
# Read the image file. # Read the image file.
with tf.gfile.FastGFile(filename, 'rb') as f: with tf.gfile.FastGFile(filename, 'r') as f:
image_data = f.read() image_data = f.read()
# Convert any PNG to JPEG's for consistency. # Convert any PNG to JPEG's for consistency.
......
...@@ -41,32 +41,31 @@ fi ...@@ -41,32 +41,31 @@ fi
# Create the output and temporary directories. # Create the output and temporary directories.
DATA_DIR="${1%/}" DATA_DIR="${1%/}"
SCRATCH_DIR="${DATA_DIR}/raw-data" SCRATCH_DIR="${DATA_DIR}/raw-data/"
mkdir -p "${DATA_DIR}" mkdir -p "${DATA_DIR}"
mkdir -p "${SCRATCH_DIR}" mkdir -p "${SCRATCH_DIR}"
# http://stackoverflow.com/questions/59895/getting-the-source-directory-of-a-bash-script-from-within WORK_DIR="$0.runfiles/inception/inception"
WORK_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
# Download the flowers data. # Download the flowers data.
DATA_URL="http://download.tensorflow.org/example_images/flower_photos.tgz" DATA_URL="http://download.tensorflow.org/example_images/flower_photos.tgz"
CURRENT_DIR=$(pwd) CURRENT_DIR=$(pwd)
cd "${DATA_DIR}"
TARBALL="flower_photos.tgz" TARBALL="flower_photos.tgz"
if [ ! -f ${TARBALL} ]; then if [ ! -f ${TARBALL} ]; then
echo "Downloading flower data set." echo "Downloading flower data set."
curl -o ${DATA_DIR}/${TARBALL} "${DATA_URL}" curl -o ${TARBALL} "${DATA_URL}"
else else
echo "Skipping download of flower data." echo "Skipping download of flower data."
fi fi
# Note the locations of the train and validation data. # Note the locations of the train and validation data.
TRAIN_DIRECTORY="${SCRATCH_DIR}/train" TRAIN_DIRECTORY="${SCRATCH_DIR}train/"
VALIDATION_DIRECTORY="${SCRATCH_DIR}/validation" VALIDATION_DIRECTORY="${SCRATCH_DIR}validation/"
# Expands the data into the flower_photos/ directory and rename it as the # Expands the data into the flower_photos/ directory and rename it as the
# train directory. # train directory.
tar xf ${DATA_DIR}/flower_photos.tgz tar xf flower_photos.tgz
rm -rf "${TRAIN_DIRECTORY}" "${VALIDATION_DIRECTORY}" rm -rf "${TRAIN_DIRECTORY}" "${VALIDATION_DIRECTORY}"
mkdir -p "${TRAIN_DIRECTORY}"
mv flower_photos "${TRAIN_DIRECTORY}" mv flower_photos "${TRAIN_DIRECTORY}"
# Generate a list of 5 labels: daisy, dandelion, roses, sunflowers, tulips # Generate a list of 5 labels: daisy, dandelion, roses, sunflowers, tulips
...@@ -75,22 +74,22 @@ ls -1 "${TRAIN_DIRECTORY}" | grep -v 'LICENSE' | sed 's/\///' | sort > "${LABELS ...@@ -75,22 +74,22 @@ ls -1 "${TRAIN_DIRECTORY}" | grep -v 'LICENSE' | sed 's/\///' | sort > "${LABELS
# Generate the validation data set. # Generate the validation data set.
while read LABEL; do while read LABEL; do
VALIDATION_DIR_FOR_LABEL="${VALIDATION_DIRECTORY}/${LABEL}" VALIDATION_DIR_FOR_LABEL="${VALIDATION_DIRECTORY}${LABEL}"
TRAIN_DIR_FOR_LABEL="${TRAIN_DIRECTORY}/${LABEL}" TRAIN_DIR_FOR_LABEL="${TRAIN_DIRECTORY}${LABEL}"
# Move the first randomly selected 100 images to the validation set. # Move the first randomly selected 100 images to the validation set.
mkdir -p "${VALIDATION_DIR_FOR_LABEL}" mkdir -p "${VALIDATION_DIR_FOR_LABEL}"
VALIDATION_IMAGES=$(ls -1 "${TRAIN_DIR_FOR_LABEL}" | shuf | head -100) VALIDATION_IMAGES=$(ls -1 "${TRAIN_DIR_FOR_LABEL}" | shuf | head -100)
for IMAGE in ${VALIDATION_IMAGES}; do for IMAGE in ${VALIDATION_IMAGES}; do
mv -f "${TRAIN_DIRECTORY}/${LABEL}/${IMAGE}" "${VALIDATION_DIR_FOR_LABEL}" mv -f "${TRAIN_DIRECTORY}${LABEL}/${IMAGE}" "${VALIDATION_DIR_FOR_LABEL}"
done done
done < "${LABELS_FILE}" done < "${LABELS_FILE}"
# Build the TFRecords version of the image data. # Build the TFRecords version of the image data.
cd "${CURRENT_DIR}" cd "${CURRENT_DIR}"
BUILD_SCRIPT="${WORK_DIR}/build_image_data.py" BUILD_SCRIPT="${WORK_DIR}/build_image_data"
OUTPUT_DIRECTORY="${DATA_DIR}" OUTPUT_DIRECTORY="${DATA_DIR}"
python "${BUILD_SCRIPT}" \ "${BUILD_SCRIPT}" \
--train_directory="${TRAIN_DIRECTORY}" \ --train_directory="${TRAIN_DIRECTORY}" \
--validation_directory="${VALIDATION_DIRECTORY}" \ --validation_directory="${VALIDATION_DIRECTORY}" \
--output_directory="${OUTPUT_DIRECTORY}" \ --output_directory="${OUTPUT_DIRECTORY}" \
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment