基于Pytorch的从零开始的目标检测（上）-苏州机器视觉培训，苏州上位机培训和讯上位机机器视觉培训

引言

目标检测是计算机视觉中一个非常流行的任务，在这个任务中，给定一个图像，你预测图像中物体的包围盒(通常是矩形的) ，并且识别物体的类型。在这个图像中可能有多个对象，而且现在有各种先进的技术和框架来解决这个问题，例如 Faster-RCNN 和 YOLOv3。

本文讨论将讨论图像中只有一个感兴趣的对象的情况。这里的重点更多是关于如何读取图像及其边界框、调整大小和正确执行增强，而不是模型本身。目标是很好地掌握对象检测背后的基本思想，你可以对其进行扩展以更好地理解更复杂的技术。

问题陈述

给定一个由路标组成的图像，预测路标周围的包围盒，并识别路标的类型。这些路标包括以下四种：

· 红绿灯

· 停止

· 车速限制

· 人行横道

这就是所谓的多任务学习问题，因为它涉及执行两个任务: 1)回归找到包围盒坐标，2)分类识别道路标志的类型

数据集

它由877张图像组成。这是一个相当不平衡的数据集，大多数图像属于限速类，但由于我们更关注边界框预测，因此可以忽略不平衡。

加载数据

每个图像的注释都存储在单独的 XML 文件中。我按照以下步骤创建了训练数据集:

· 遍历训练目录以获得所有.xml 文件的列表。

· 使用xml.etree.ElementTree解析.xml文件。

· 创建一个由文件路径、宽度、高度、边界框坐标（ xmin 、 xmax 、 ymin 、 ymax ）和每个图像的类组成的字典，并将字典附加到列表中。

· 使用图像统计数据字典列表创建一个 Pandas 数据库。

def filelist(root, file_type):

"""Returns a fully-qualified list of filenames under root directory"""

return [os.path.join(directory_path, f) for directory_path, directory_name,

files in os.walk(root) for f in files if f.endswith(file_type)]

def generate_train_df (anno_path):

annotations = filelist(anno_path, '.xml')

anno_list = []

for anno_path in annotations:

root = ET.parse(anno_path).getroot()

anno = {}

anno['filename'] = Path(str(images_path) + '/'+ root.find("./filename").text)

anno['width'] = root.find("./size/width").text

anno['height'] = root.find("./size/height").text

anno['class'] = root.find("./object/name").text

anno['xmin'] = int(root.find("./object/bndbox/xmin").text)

anno['ymin'] = int(root.find("./object/bndbox/ymin").text)

anno['xmax'] = int(root.find("./object/bndbox/xmax").text)

anno['ymax'] = int(root.find("./object/bndbox/ymax").text)

anno_list.append(anno)

return pd.DataFrame(anno_list)

· 标签编码类列

#label encode target

class_dict = {'speedlimit': 0, 'stop': 1, 'crosswalk': 2, 'trafficlight': 3}

df_train['class'] = df_train['class'].apply(lambda x: class_dict[x])

调整图像和边界框的大小

由于训练一个计算机视觉模型需要的图像是相同的大小，我们需要调整我们的图像和他们相应的包围盒。调整图像的大小很简单，但是调整包围盒的大小有点棘手，因为每个包围盒都与图像及其尺寸相关。

下面是调整包围盒大小的工作原理:

· 将边界框转换为与其对应的图像大小相同的图像（称为掩码）。这个掩码只有 0 表示背景，1 表示边界框覆盖的区域。

· 将掩码调整到所需的尺寸。

· 从调整完大小的掩码中提取边界框坐标。

def create_mask(bb, x):

"""Creates a mask for the bounding box of same shape as image"""

rows,cols,*_ = x.shape

Y = np.zeros((rows, cols))

bb = bb.astype(np.int)

Y[bb[0]:bb[2], bb[1]:bb[3]] = 1.

return Y

def mask_to_bb(Y):

"""Convert mask Y to a bounding box, assumes 0 as background nonzero object"""

cols, rows = np.nonzero(Y)

if len(cols)==0:

return np.zeros(4, dtype=np.float32)

top_row = np.min(rows)

left_col = np.min(cols)

bottom_row = np.max(rows)

right_col = np.max(cols)

return np.array([left_col, top_row, right_col, bottom_row], dtype=np.float32)

def create_bb_array(x):

"""Generates bounding box array from a train_df row"""

return np.array([x[5],x[4],x[7],x[6]])

def resize_image_bb(read_path,write_path,bb,sz):

"""Resize an image and its bounding box and write image to new path"""

im = read_image(read_path)

im_resized = cv2.resize(im, (int(1.49*sz), sz))

Y_resized = cv2.resize(create_mask(bb, im), (int(1.49*sz), sz))

new_path = str(write_path/read_path.parts[-1])

cv2.imwrite(new_path, cv2.cvtColor(im_resized, cv2.COLOR_RGB2BGR))

return new_path, mask_to_bb(Y_resized)

#Populating Training DF with new paths and bounding boxes

new_paths = []

new_bbs = []

train_path_resized = Path('./road_signs/images_resized')

for index, row in df_train.iterrows():

new_path,new_bb = resize_image_bb(row['filename'], train_path_resized, create_bb_array(row.values),300)

new_paths.append(new_path)

new_bbs.append(new_bb)

df_train['new_path'] = new_paths

df_train['new_bb'] = new_bbs

上一条: 基于Pytorch的从零开始的目标检测（下）-苏州机器视觉学习，苏州上位机学习

下一条: 基于3D分光干涉仪的偏光片薄膜沟槽深度检测方案-苏州机器视觉培训