PKU SketchRe-ID Dataset

INTRODUCTION

The PKU Sketch Re-ID dataset is constructed by National Engineering Laboratory for Video Technology (NELVT), Peking University.

The dataset contains 200 persons, each of which has one sketch and two photos. Photos of each person were captured during daytime by two cross-view cameras. We cropped the raw images (or video frames) manually to make sure that every photo contains the one specific person. We have a total of 5 artists to draw all persons’ sketches and every artist has his own painting style.

LICENSE 

  • The images and the corresponding annotation results can only be used for ACADEMIC PURPOSES. NO COMERCIAL USE is allowed.
  • Copyright © National Engineering Laboratory for Video Technology (NELVT) and Institute of Digital Media, Peking University (PKU-IDM). All rights reserved.

All publications using PKU SketchRe-ID dataset should cite the paper below:

Lu Pang, Yaowei Wang, Yi-Zhe Song, Tiejun Huang, Yonghong Tian; Cross-Domain Adversarial Feature Learning for Sketch Re-identification; ACM Multimedia 2018

DOWNLOAD

You can download the agreement (pdf) from here. After filling it, please send the electrical version to our Email: pkuml at pku.edu.cn (Subject: PKU-SketchReID-Agreement) 

 

Please send it through an academic or institute email-addresses such as xxx at xxx.edu.xx. Requests from free email addresses (outlook, gmail, qq etc) will be kindly refused.

 

After confirming your information, we will send the download link and password to you via Email. You need to follow the agreement.

Usually we will reply in a week. But sometimes the mail does not arrive and display successfully for some unknown reason. If this happened, please change the content or title and try sending again.

PKU-VD Dataset

INTRODUCTION

The PKU-VD datasets including VD1 and VD2 are constructed by National Engineering Laboratory for Video Technology (NELVT), Peking University, sponsored by the National Basic Research Program of China and Chinese National Natural Science Foundation. We construct two large-scale vehicle datasets1 (i.e., VD1 and VD2) based on real-world unconstrained scenes from two cities respectively. The images in VD1 are obtained from high resolution traffic cameras, and images in VD2 are captured from surveillance videos. We perform vehicle detection on the raw data to make sure that each image only contains one vehicle. The region of plate number has been covered by black color due to privacy protection.

We provide diverse attribute annotations for each image in both two datasets, including identity number, precise vehicle model and vehicle color. Specifically, identity number (ID) is unique and all images belong to the same vehicle have the same ID (we make sure that there are at least two images in the dataset for each vehicle ID). We provide the most precise model type with detailed vehicle type and different produced years. For example, Audi-A6L-2012&2015, Audi-A6-2004, Audi-A4-2006&2008 and Audi-A4-2004&2005 are four different vehicle models in our datasets. As for color information, 11 common colors are annotated in our datasets. We carefully check all annotations to ensure the consistency of labels so that all the images belonging to the same vehicle ID are annotated with the same vehicle model and color. 

VD1: There are total 1,097,649 images in the dataset. We label 1,232 vehicle models and 11 colors.

VD2: There are total 807,260 images in the dataset. We label 1,112 vehicle models and 11 colors.
LICENSE 

The PKU-VD datasets are now partly made available for the academic purpose only on a case-by-case basis. The NELVT at Peking University is serving as the technical agent for distribution of the dataset and reserves the copyright of all the images in the dataset. Any researcher who requests the PKUVehicleID dataset must sign this agreement and thereby agrees to observe the restrictions listed in this document.

  • The images and the corresponding annotation results for download are part of PKU-VD datasets.
  • The images and the corresponding annotation results can only be used for ACADEMIC PURPOSES. NO COMERCIAL USE is allowed.
  • Copyright © National Engineering Laboratory for Video Technology (NELVT) and Institute of Digital Media, Peking University (PKU-IDM). All rights reserved.

All publications using PKU-VD dataset should cite the paper below:

@inproceedings{yan2017exploiting,
 title={Exploiting Multi-Grain Ranking Constraints for Precisely Searching Visually-Similar Vehicles},
 author={Yan, Ke and Tian, Yonghong and Wang, Yaowei and Zeng, Wei and Huang, Tiejun},
 booktitle={Proceedings of the IEEE International Conference on Computer Vision},
 pages={562--570},
 year={2017}
}

DOWNLOAD

You can download the agreement (pdf) from here. Please make sure that a permanent/long-term responsible person (e.g., professor, PI) fills in the agreement with a handwriting signature. After filling it, please send the electrical version to our Email: pkuml at pku.edu.cn (Subject: PKU-VD-Agreement)

Please send it through an academic or institute email-addresses such as xxx at xxx.edu.xx. Requests from free email addresses (outlook, gmail, qq etc) will be kindly refused.

After confirming your information, we will send the download link and password to you via Email. You need to follow the agreement.

Usually we will reply in a week. But sometimes the mail does not arrive and display successfully for some unknown reason. If this happened, please change the content or title and try sending again.

DRDL

This page includes some resource of our paper “Deep Relative Distance Learning: Tell the Difference Between Similar Vehicles”.

Since we are not allowed to release the complete code due to some confidential protocols, we provide only the core part and the model’s prototxt files(for both training and testing) here.

  • You can find the source code of “coupled clusters loss”, “triplet loss”, “l2 normalization” and all other related caffe code here. Notice that you can not feed multiple labels to a normal data layer in the original Caffe. Thus, we modified “MemoryDataLayer” to support it. For instance, if you want to feed 3 different labels(label1, label2, label3) into the MemoryDataLayer, please edit your data layer in the prototxt like.

layer {
	name: "data"
	type: "MemoryData"
	top: "data"
	top: "label1"
	top: "label2"
	top: "label3"
	include {
		phase: TRAIN
	}
	memory_data_param {
		num_tasks: 3
		batch_size: 128
		channels: 3
		height: 224
		width: 224
	}
}

You can then feed the input data in Python like

x = np.zeros((128, 3, 224, 224), dtype=np.float32)
y = np.zeros((3, 128), dtype=np.float32)
solver.net.set_input_arrays(X, Y)
  • We also release our model’s prototxt files for both training and testing. You can download it here.
  • The VehicleID dataset can be find here.

PKU VehicleID

INTRODUCTION

The PKU VehicleID dataset is constructed by National Engineering Laboratory for Video Technology (NELVT), Peking University, sponsored by the National Basic Research Program of China and Chinese National Natural Science Foundation.

The “VehicleID” dataset contains data captured during daytime by multiple real-world surveillance cameras distributed in a small city in China. There are 26267 vehicles(221763 images in total) in the entire dataset. Each image is attached with an id label corresponding to its identity in real world. In addition, we manually labeled 10319 vehicles(90196 images in total) of their vehicle model information(i.e.“MINI-cooper”, “Audi A6L” and “BWM 1 Series”).

Screen Shot 2016-08-01 at 12.46.07 PM

The PKU VehicleID dataset is now partly made available for the academic purpose only on a case-by-case basis. The NELVT at Peking University is serving as the technical agent for distribution of the dataset and reserves the copyright of all the images in the dataset. Any researcher who requests the PKUVehicleID dataset must sign this agreement and thereby agrees to observe the restrictions listed in this document.

LICENSE

  • The images and the corresponding annotation results for download are part of PKU VehicleID dataset.
  • The images and the corresponding annotation results can only be used for ACADEMIC PURPOSES. NO COMERCIAL USE is allowed.
  • Copyright © National Engineering Laboratory for Video Technology (NELVT) and Institute of Digital Media, Peking University (PKU-IDM). All rights reserved.

All publications using PKU VehicleID dataset should cite the paper below:

@inproceedings{liu2016deep,
  title={Deep Relative Distance Learning: Tell the Difference Between Similar Vehicles},
  author={Liu, Hongye and Tian, Yonghong and Wang, Yaowei and Pang, Lu and Huang, Tiejun},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={2167--2175},
  year={2016}
}

DOWNLOAD

You can download the agreement (pdf) from here. After filling it, please send the electrical version to our Email: pkuml at pku.edu.cn (Subject: PKU-VehicleID-Agreement)  .

Please send it through an academic or institute email-addresses such as xxx at xxx.edu.xx. Requests from free email addresses (outlook, gmail, qq etc) will be kindly refused.

After confirming your information, we will send the download link and password to you via Email. You need to follow the agreement.

Usually we will reply in a week. But sometimes the mail does not arrive and display successfully for some unknown reason. If this happened, please change the content or title and try sending again.

Code

Papers

Incorporating Learnable Membrane Time Constant to Enhance Learning of Spiking Neural Networks

  • Wei Fang, Zhaofei Yu, Yanqi Chen, Timothée Masquelier, Tiejun Huang and Yonghong Tian*
  • ICCV 2021
  • For an introduction to the paper, see the README.md. [ENGLISH/CHINESE]

[PDF] [Code]

 

Deep Relative Distance Learning: Tell the Difference Between Similar Vehicles

  • Hongye Liu, Yonghong Tian*, Yaowei Wang*, Lu Pang, Tiejun Huang
  • CVPR 2016

[PDF] [Code]

 

Unsupervised Cross-Dataset Transfer Learning for Person Re-identification

  • Peixi Peng, Tao Xiang, Yaowei Wang, Massimiliano Pontil, Shaogang Gong, Tiejun Huang, Yonghong Tian*
  • CVPR 2016

[PDF] [Code]

 

Learning Complementary Saliency Priors for Foreground Object Segmentation in Complex Scenes

  • Yonghong Tian, Jia Li, Shui Yu, Tiejun Huang
  • Int’l Journal of Computer Vision, 111(2), Jan 2015, 153-170. 10.1007/s11263-014-0737-1

[PDF] [Code]

 

Image Saliency Estimation via Random Walk Guided by Informativeness and Latent Signal Correlations

  • Jia Li, Shu Fang (first co-author), Yonghong Tian*, Tiejun Huang, and Xiaowu Chen
  • Signal Processing: Image Communication, (2015)

[PDF] [Code]

 

Revisiting Mid-Level Patterns for Cross-Domain Few-Shot Recognition

  • Yixiong Zou, Shanghang Zhang, Jianpeng Yu, Yonghong Tian*, José M. F. Moura
  • 29th ACM Int’l Conf. Multimedia. (2021)

[PDF] [Code] (Password: ko4q)

 

Projects

SpikingJelly

  • SpikingJelly is an open-source deep learning framework for Spiking Neural Network (SNN) based on PyTorch.
  • The documentation of SpikingJelly is written in both English and Chinese: https://spikingjelly.readthedocs.io.
  • For an introduction to the project, see the README.md.

[OpenI] [Github]

Surveillance Video: The Biggest Big Data

This position article is cited from T. Huang, “Surveillance Video: The Biggest Big Data,” Computing Now, vol. 7, no. 2, Feb. 2014, IEEE Computer Society [online];

————————————————————————————————-

Big data continues to grow exponentially, and surveillance video has become the largest source. Against that backdrop, this issue of Computing Now presents five articles from the IEEE Computer Society Digital Library focused on research activities related to surveillance video. It also includes some related references on how to compress and analyze the huge amount of video data that’s being generated.

Surveillance Video in the Digital Universe

In recent years, more and more video cameras have been appearing throughout our surroundings, including surveillance cameras in elevators, ATMs, and the walls of office buildings, as well as those along roadsides for traffic-violation detection, cameras for caring for kids or seniors, and those embedded in laptops and on the front and back sides of mobile phones. All of these cameras are capturing huge amounts of video and feeding it into cyberspace daily. For example, a city such as Beijing or London has about one million cameras deployed. Now consider that these cameras capture more in one hour than all the TV programs in the archives of the British Broadcasting Corporation (BBC) or China Central Television (CCTV). According to the International Data Corporation’s recent report, “The Digital Universe in 2020,” half of global big data — the valuable matter for analysis in the digital universe — was surveillance video in 2012, and the percentage is set to increase to 65 percent by 2015.

To understand about R&D activities related to video surveillance, I searched the keywords video and surveillance in IEEE Xplore (within metadata only) and the IEEE CSDL (by exact phrase). The search results showed 6,832 (in Xplore) and 3,111 (in CS Digital Library) related papers published in IEEE conferences, journals, or magazines. Figure 1 shows the annual histogram of these publications. Obviously, the sharp increasing in the past ten years indicates that research on surveillance video is very active.

Figure 1. Histogram of publications in IEEE Computer Society Digital Library and IEEE Xplore for which metadata contains the keywords video and surveillance. Note: “~1989” shows all articles up to 1989. The numbers for 2013 might also increase as some are still waiting to be archived into the database.

Theme Articles

Surveillance-video big data introduces many technological challenges, including compression, storage, transmission, analysis, and recognition. Among these, the two most critical challenges are how to efficiently transmit and store the huge amount of data, and how to intelligently analyze and understand the visual information inside.

Higher-efficiency video compression technology is urgently needed to reduce the storage and transmission cost of big surveillance data. The state-of-the-art High Efficiency Video Coding (HEVC) standard, featured in the October 2013 CN theme, can compress a video to about 3 percent of its original data size. In other words, HEVC doubles the data compression ratio of the H.264/MPEG-4 AVC approved in 2003. In fact, the latter doubled the ratio of the previous-generation standards MPEG-2/H.262, which were approved in 1993. Despite these advances, this doubling of video-compression performance every ten years is too slow to keep pace with the growth of surveillance video in our physical world, which is now doubling every two years, on average!

To achieve a higher compression ratio, the unique characteristics of surveillance video must be factored into the design of new video-encoding standards. Unlike standard video, for instance, surveillance footage is usually captured in a specific place day after day, or even month after month. Yet, previous standards fail to account for the specific residuals that exist in surveillance video (for example, unchanging backgrounds or foreground objects that appear many times). The new IEEE std 1857, entitled Standard for Advanced Audio and Video Coding, contains a surveillance profile that can further remove background residuals. The profile doubles the AVC/H.264 compression ratio with even lower complexity. In “IEEE 1857 Standard Empowering Smart Video Surveillance Systems,” Wen Gao, our colleagues, and I present an overview of the standard, highlighting its background-model-based coding technology and recognition-friendly functionalities. The new approach is also employed to enhance HEVC/H.265 and nearly double its performance as well. (Additional technical details can be found in “Background-Modeling Based Adaptive Prediction for Surveillance Video Coding,” which is available to subscribers via IEEE Xplore.)

Much like the physical universe, the vast majority of the digital universe is so-called digital dark matter — it’s there, but what we know about it is very limited. According to the IDC report I mentioned earlier, 23 percent of the information in the digital universe would be useful for big data if it were tagged and analyzed. Yet, technology is far from where it needs to be, and in practice, only 3 percent of potentially useful data is tagged — and even less is currently being analyzed. In fact, people, vehicles, and other moving objects appearing in millions of cameras will be a rich source for machine analysis to understand the complicated society and world. As guest editor Dorée Duncan Seligmann discussed in CN’s April 2012 theme, video is even more challenging than other data types for automatic analysis and understanding. This month we add three articles on the topic that have been published since then.

Human beings are generally the major objects of interest in surveillance video analysis. In the best paper from the 2013 IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS), “Reference-Based Person Re-identification” (available to IEEE Xplore subscribers), Le An and his colleagues propose a reference-based method for learning a subspace in which the correlations among reference data from different cameras are maximized. From there, the system can identify people who are present in different camera views with significant illumination changes.

Human behavior analysis is the next step for deeper understanding. Shuiwang Ji and colleagues’ “3D Convolutional Neural Networks for Human Action Recognition” introduces the deep learning underlying human-action recognition. The proposed 3D convolutional neural networks model extracts features from both the spatial and the temporal dimensions by performing 3D convolutions, thereby capturing the motion information encoded in multiple adjacent video frames. Experiments conducted using airport videos achieved superior performance compared to baseline methods.

In “Monocular Visual Scene Understanding: Understanding Multi-Object Traffic Scenes,” Christian Wojek and his colleagues present a novel probabilistic 3D scene model that integrates geometric 3D reasoning with state-of-the-art multiclass object detection, object tracking, and scene labeling. This model uses inference to jointly recover the 3D scene context and perform 3D multi-object tracking, using only monocular video as input. The article includes an evaluation of several challenging sequences captured by onboard cameras, which illustrate that the approach shows substantial improvement over the current state of the art in 3D multiperson tracking and multiclass 3D tracking of cars and trucks on a challenging data set.

Toward a Scene Video Age

This month’s theme also includes a video from John Roese, the CTO of EMC Corp., with his technical insight on this topic.

Much like surveillance, the number of videos captured in classrooms, courts, and other site-specific cases is increasing quickly as well. This is the prelude to a “scene video” age in which most videos will be captured from specific scenes. In the near future, these pervasive cameras will cover all the spaces the human race is able to reach.

In this new age, the ‘scene’ will become the bridge to connect video coding and computer vision research. Modeling these scenes could facilitate further video compression as demonstrated by the IEEE 1857 standard. And then, with the assistance of such scene models encoded in the video stream, the foreground-object detection, tracking, and recognition becomes less difficult. In this sense, the massive growth in surveillance and other kinds of scene video presents big challenges, as well as big opportunities, for the video- and vision-related research communities.

In 2015, the IEEE Computer Society’s Technical Committee on Multimedia Computing (TCMC) and Technical Committee on Semantic Computing (TCSEM) will jointly sponsor the first IEEE International Conference on Multimedia Big Data, a world premier forum of leading scholars in the highly active multimedia big data research, development and applications. Interested readers are welcome to join us at this new conference next spring in Beijing for more discussion on the rapidly growing multimedia big data.

视频大数据:挑战和机遇

在各类大数据中,图像视频是“体量最大的大数据”。据思科统计,视频内容约占互联网总流量的90%;而在迅速发展的移动网络中,视频流量的比例也高达64%,并以超过130%的年复合增长率增长,预计2013年将达到2,184PB。可见,图像视频数据在大数据中占据着主导地位,因此图像视频的处理是大数据应用的关键所在。而且,相对于文本、语音等数据,图像视频的数据量更大、维度更高,其表达、处理、传输和利用的技术挑战性更大。因此,以图像视频为对象的大数据研究将成为新的发展方向和研究热点
近年来,我国通过平安城市建设实现了城域范围的视频监控系统。据统计,2013年全国监控摄像头的规模已经超过了3000万台。以监控视频为例(如下图),我们可从应用、数据、技术三个维度来分析了当前视频大数据的总体发展趋势:

  • 应用维度来看,视频感知技术与系统在安防、交管上有初步成功应用,但在城市运行(如城市范围内的运行态势分析)和社会服务(如重点景区的实时直播、家庭看护)方面才刚起步。因此,如何从“管、控”到“服务”是视频大数据应用发展的重点。
  • 数据维度来看,基于单摄像头的视频感知分析技术与系统已初步可应用。一个典型的例子就是交通电子警察的监控与违章发现。然而,随着城市视频监控系统规模的不断扩大和应用需求的爆炸式增长,处理跨摄像头视频数据、大规模摄像头网络数据、甚至融合各类视频图像及关联数据的视频大数据就成为当务之急。
  • 技术维度来看,过去20年已基本解决监控视频摄像系统的数字化问题,近五年开始解决大规模监控摄像头的高清化问题。然而在现阶段,大量布设的监控摄像节点的智能程度低,信息不能实时处理,从而造成以人工监测为主的应用模式效率低下,对突发事件的反应缓慢。因此,视频监控的智能化是今后相当长一段时间内的研究与应用重点。更进一步,由于目前仍缺少广域范围内多摄像机视频数据的协同处理与计算技术,因而难以深度利用和挖掘这些广域视频中丰富的人、物、行为乃至事件信息。因此大数据智能化(简称大数据化)是未来若干年的必然研究与发展趋势。

1 视频大数据的主体是监控视频,但广义的视频大数据还包括会议视频、家庭视频、教学视频、法庭视频等,它们往往采用固定摄像头来对某个特定的场景(如家庭、教室、会议室、法庭、交通道口等)在一段时间内进行拍摄。由于这类视频都有较为固定的场景特性,场景的拍摄并没有预定义的剧本或意图,我们称之为“场景视频”(Scenic Video)。与其他类型视频(如新闻视频、影视视频、体育视频)相比,场景视频是机器视觉研究的最可能突破口,也是解决视频大数据问题的最佳实验数据。首先,场景视频是通过摄像头长期注视单一场景而采集的视频,场景中的“变”与“不变”要素能有效地进行建模、分析与推理,从而有可能达到高效压缩、场景理解甚至精确识别;其次,场景视频的体量巨大,是视频大数据中的主要研究对象,因此场景视频处理与分析技术的发展能直接推动视频大数据的技术水平。
从更大的视角来看,信息技术领域正在孕育重大技术突破与变革,视频大数据是主要的推动力之一及最佳的应用问题。一方面,认知与脑科学、机器视觉等相关领域正在发生从“量变”到“质变”的过程,预期能在未来若干年内获得理论上的突破。另一方面,近一两年来以深度学习和电子大脑为代表的人工智能技术获得了迅猛的发展,产生了包括Google Brain、百度大脑、IBM的类人脑芯片TrueNorth等一批标志性的阶段成果。这些成果将为场景视频的处理与分析提供强有力的计算与处理能力。