I (Heart) Picsearch And Python | Breaking Eggs And Making Omelettes

I don’t know much about Picsearch. I don’t know what differentiates them from Google’s image search. And I certainly don’t know what they’re doing scouring the internet for video. But I know what I like, and I like the fact that Picsearch has submitted back to the FFmpeg development team 3 gargantuan lists of URLs:

A list of 5100+ URLs linking to videos that crash FFmpeg
A list of 3200 URLs linking to videos that have relatively uncommon video codecs
A list of 1600+ URLs linking to videos that have relatively uncommon audio codecs

That first list is a quality engineer’s dream come true. I was able to download a little more than 4400 of the crasher URLs. The list was collected sometime last year and the good news is that FFmpeg has fixed enough problems that over half of the alleged crashers do not crash. There are still a lot of problems but I think most of them will cluster around a small set of bugs, particularly concerning the RealMedia demuxer.

I am currently downloading the uncommon video and audio format files. Given my interests, if processing the crashers is akin the having to eat my vegetables, processing a few thousand files with heretofore unknown codecs is like dessert!

So far, the challenge here has been to both download and process the huge amount of samples efficiently. The usual “download and manually test” protocol usually followed when a problem sample is reported does not really scale in this situation. Invariably, I first try some half-hearted shell-based solutions. But… who really likes shell programming?

So I moved swiftly on to custom Python scripts for downloading and testing these files. Once I tighten up the scripts a little more and successfully process as many samples as I can, I will share them here, if only so I have a place where I can easily refer to the scripts again should I need them in the future (scripts are easily misplaced on my systems).

10 thoughts on “I (Heart) Picsearch And Python”

Reimar December 6, 2008 at 1:32 am

That’s why I always put my scripts in ~/bin. But I guess I have a few fewer than you ;-)

Anonymous December 6, 2008 at 7:29 am

I took a look at the uncommon audio codecs list. The most frequent codecs are:

0x1100736d – What is this?
0x7a21 – AMR (What happened to this SOC project?)
wmav1 – should work
truespeech – should work
QDMC – undiscovered, no documentation
real_288 – should work
0x0006 – a-law, should work
pcm_s24le – should work
sawb – AMR
mace3 – should work
imc – Intel Music Coder, should work
pcm_s32be – should work
adpcm_swf – should work
0x0163 – WMA lossless, undiscovered
drms – No way
fl64 – floating point PCM, should work
0x0402 – Ligos Indeo Audio, undiscovered
mp1 – MPEG layer 1, should work
fl32 – floating point PCM, should work
Qclq – QCELP, stuff to test
adpcm_ct – Creative ADPCM decoder, should work
0x00ff – AAC, should work
pcm_s32le – should work
0x7a22 – AMR
0x0003 – floating point PCM, should work

So most of the above look like good things to test.

Multimedia Mike Post authorDecember 6, 2008 at 8:08 am

True, uncommon != unsupported in FFmpeg. And there has already been a lot of movement on the development mailing list to map some of the video FourCCs to known codecs (apparently, there are even more aliases for MPEG-4 part 2 video).

Vitor December 6, 2008 at 9:07 am

> I took a look at the uncommon audio codecs list. The most frequent codecs are:

> 0Ã—1100736d – What is this?

This is adpcm_ima_wav, works fine with recent SVN.

compn December 8, 2008 at 6:24 pm

added a few of those codecs to mplayer and ffmpeg
still a few left to figure out.

i wonder if that vorbis/theora in .mov file plays on anything :)

Multimedia Mike Post authorDecember 8, 2008 at 6:25 pm

Thanks for your work on this, compn. I have been watching your commits to files like riff.c. I will be taking those into account as I methodically add samples to the MultimediaWiki (slow process, just take it a few at a time, each evening).

astrange December 8, 2008 at 10:01 pm

> QDMC – undiscovered, no documentation

http://wiki.multimedia.cx/index.php?title=QDesign_Music_Codec

mkhodor December 9, 2008 at 12:44 am

> > QDMC – undiscovered, no documentation
>
> http://wiki.multimedia.cx/index.php?title=QDesign_Music_Codec

FFmpeg has a buggy decoder for QDM2, and nothing for QDMC. So there is no complete and accurate documentation.

compn December 9, 2008 at 4:07 pm

binary codecs were uploaded to:
http://samples.mplayerhq.hu/drivers32/new/
please test and let me know if any other dlls are required.

Multimedia Mike Post authorDecember 9, 2008 at 4:45 pm

QDMC is well known as an earlier incarnation of the widely-used QDM2 codec. It has been shown through reverse engineering to be similar but not strictly compatible with its successor. It is great that we have some more samples, the lack of which has traditionally impeded RE efforts.

Comments are closed.