יום חמישי, 16 במאי 2013

Video processing algorithm benchmark using Hadoop

Developing a video processing algorithm involved a repeatable process of running a benchmarks in order to test the algorithm after every minor change.
Because often the video movies samples DB is huge  the execution time of a benchmark phase may be time consuming. 
In order to minimize the developer  waiting time for results /feedback about it’s algorithm changes  we used to run the algorithm benchmark in parallel using apache Hadoop.

Hadoop is suitable for this mission for the following reasons:
1.The tests for each video movie sample is independent from each other only after all the videos movies samples were tests we are calculating the current algorithm version mark.
2.The video movies samples db is huge currently larger the 20T .
3.There is no seeking or searching process over the DB all the data is processed sequentially.
4.The benchmark process is very CPU/ GPU resources consumer.
5.The video Samples DB is very static only insert of new samples no updates merely deleted of video samples.

Notes:
1.Although Hadoop is Java oriented there is no problem to use  other programming languages in order to implement the map / reducer .
2.A single  video  samples sizes  for the mapper should fit its optimal size (no 3 hour movie).
3.We start by storing uncompressed video samples (AVI) in the HDFS in order to void the uncompressing process but the size  of the samples were too large so we had to compress the video into mpeg chunk with low compression factor.

The parts that take place in the benchmark are :
Input splitter:
The duty of the input splitter is to read the row data from the HDFS and generate a logical structure that contains un compressed video movie chunk to be test , reference data for algorithm mark calculations (for an example in face recognition process a result made by human  ) .
In order to accomplish this we developed a custom inputformat input splitter and record reader .
The input splitter is a static component that merely changed .

Mapper:
The mapper duty is to execute the algorithm against the video sample producing the results and debug data  .For an example if the benchmark is testing face recognition the results of the mapper phase is a collection of faces that were detected , the detection time reference for comparing the results
for each movie chunk there is a unique mark indicate the algorithm performance against it.
The mapper is the dynamic component of the process it is changed for every execution.
The mapper contains reference to a DB containing reference data used to perform compeering algorithm

Reducer
The Reducer phase duty is to accumulate the marks  for the algorithms based on the mapper results.

Notes:
The process contains Partition & Shuffle units used to group map result with specific characteristics.
At the end of the process a report is created and  send to the developer and stored in a db referencing the algorithm version. 

2 תגובות:

  1. your blog is very nice and thanks for share us.
    thanks for the tipsHadoop videos

    השבמחק
    תשובות
    1. thanks for the good post i really worked hard for this video processing can u give us how to write the map reduce code to process it.thanks in advance

      מחק