MikuAuahDark: audio

Showing posts with label audio. Show all posts

Thursday, May 14, 2020

Container vs. Codec

Often people misinterpret container as video format. Here's one simple scenario.

Say, there's A who have very old phone, released in 2007, and B who have smartphone released in 2019.
A: Can you send me that MP4 video in your smartphone
B: Sure.
A: Why it doesn't play in my phone? The phone says it supports MP4 video format but it can't play your video. Your video is bad and so do you.

So let's make this clear. First of all there are no such thing as "MP4 video format". Keep your questions for later. In the world of multimedia files such as video and audio, you have to know 3 things:

Container file
Audio codec
Video codec

Container is a file that stores information how the video and the audio are stored inside the file. Sometimes also subtitles and additional files. Container also has information about what decoder should be used to decode the video and the audio and also contains when those video and audio should be decoded and presented to user. Here are some popular container file:

MP4. Yes MP4 is a container, not a video format. Now you know!
WebM.
Matroska. Also known as MKV. The mother of container as it supports almost every codec in existence.

So, if MP4 is not video format, what are actually video formats? That's what called video codecs. It represents the data about how the video are encoded, basically video codec is what important about compatibility. Even if, say I use MP4 which was supported in A's phone above, but I use more recent codec, then A's phone won't able to play the video, despite being MP4.

So what are kinds of video codecs?

MPEG4, XviD/DivX family goes here. A very old codec.
H264, most popular codecs since smartphone existed.
H265, recent codec which provides smaller size and better quality.
VP9, royalty-free codec by Google which compete with H264.
AV1, royalty-free codec various vendors which compete with H265.

PS: "royalty-free" term doesn't mean anything from user perspective. It only matters from developer perspective.

For scenario above, if B's video contains H265 codec, then A's phone won't able to play it, even if the video itself is inside MP4 container file. Now everything makes sense, right?

Then there's also audio codec. Watching silent video is not very fun right? Then here comes the audio codec. The definition is same as video codec above, but for audio instead. There's only one difference, most audio codecs can be extracted out of their container, being standalone file. That's not possible for video codec.

So, list of audio codec please? Okay.

AAC. Its standalone file extension is .aac (actually MPEG ADTS). Can be placed inside MP4 container.
Opus. Must be placed in Ogg or WebM container.
Vorbis. Must be placed in Ogg container.
FLAC. Its standalone file extension is .flac. Can be placed inside Ogg container.
MP3. Its standalone file extension is .mp3. Can be placed inside MP4 container.

Whoa, hang on, so Ogg is not an audio format? Yes. Ogg is also a container. It can contain Theora video codec.

So the conclusion is, if the video file is in extension that you know, that doesn't guarantee your device can play it. Like, you feel your device is superior because it can decode AV1 in MP4 until FL Studio's ZGameEditor Visualizer lossless video export function writes PNG image inside MP4.

Sunday, June 30, 2019

MediaFoundation decoder for LOVE & decoding in-memory.

Windows 7 has its own COM-based API to aid decoding variety of audio formats, and if you concerned about patents (AAC, why), don’t worry as it’s a licensed decoder. This blog post is about how I wrote LOVE decoder that uses MediaFoundation.

Basics

First thing to know is how LOVE decoder class looks like.

class Decoder : public Object
{
public:
    static love::Type type;
    Decoder(Data *data, int bufferSize);
    virtual ~Decoder();
    static const int DEFAULT_BUFFER_SIZE = 16384;
    static const int DEFAULT_SAMPLE_RATE = 44100;
    static const int DEFAULT_CHANNELS = 2;
    static const int DEFAULT_BIT_DEPTH = 16;
    virtual Decoder *clone() = 0;
    virtual int decode() = 0;
    virtual int getSize() const;
    virtual void *getBuffer() const;
    virtual bool seek(double s) = 0;
    virtual bool rewind() = 0;
    virtual bool isSeekable() = 0;
    virtual bool isFinished();
    virtual int getChannelCount() const = 0;
    virtual int getBitDepth() const = 0;
    virtual int getSampleRate() const;
    virtual double getDuration() = 0;

protected:
    StrongRef<Data> data;
    int bufferSize;
    int sampleRate;
    void *buffer;
    bool eof;
};

Anything that ends with = 0; means pure virtual method which we must implement in our derived class. Now let’s derive the Decoder class.

MFDecoder Class

class MFDecoder: public Decoder
{
public:
    MFDecoder(Data *data, int bufferSize);
    virtual ~MFDecoder();

    static bool accepts(const std::string &ext);
    static void quit();
    Decoder *clone();
    int decode();
    bool seek(double s);
    bool rewind();
    bool isSeekable();
    int getChannelCount() const;
    int getBitDepth() const;
    double getDuration();

private:
    static bool initialize();
    static void *initData;
    // non-exposed datatype to prevent cluttering LOVE
    // includes with Windows.h
    void *mfData;
    // channel count
    int channels;
    // temporary buffer
    std::vector<char> tempBuffer;
    // amount of temporary PCM buffer
    int tempBufferCount;
    // byte depth
    int byteDepth;
    // duration
    double duration;
};

There are few points that must be noted here.

MFDecoder contstructor receives LOVE Data object, which is data located in block of memory, and the desired buffer size.
MediaFoundation API can return any number of samples so we use temporary buffer to contain leftovers after decoding.
You may notice that mfData is void*. The reason to do this is that there’s problem compiling LOVE if Windows.h is included BEFORE the keyboard module. We also don’t want to clutter the includes with Windows-specific includes and drag down the compilation time.
The tempBuffer is std::vector. Yes this is intentional so we don’t have to manage the allocated memory and take advantage of RAII. It also helps in case MediaFoundation returns data bigger than provided buffer by simply reallocating bigger temporary buffer.
Then there’s initData member. This is set at initialize function above it, which is called when new MediaFoundation decoder is created or if the compatible extensions are being checked.

Problems

Now there are some problems.

MediaFoundation doesn’t officially support loading media from memory.
MediaFoundation assume you have the media file reside somewhere in local filesystem or in network, probably due to their DRM nature. There’s this blog post, but it only works on Windows 8 and using C++/CLI which has its own 2 problems:
1. We can’t use C++/CLI when compiling LOVE, that would reduce our compilation times and increase the bloat which we want to try to minimize as possible.
2. My target is to make the decoder available for Windows 7 and later. IRandomAccessStream and the function it uses are Windows 8 and later.
Then I found out there’s MFCreateMFByteStreamOnStream which accepts IStream interface. IStream interface is available since Windows 2000, which then SHCreateMemStream can be used to create one.
The functions that I use requires linking to Mfplat.dll and Mfreadwrite.dll.
The latter is only available in Windows 7. Since I want to make sure it runs in Windows Vista too (without the MediaFoundation decoding capabilities of course), I have to dynamically load it, hence the MFDecoder::initialize() static function.
Once IMFByteStream is created, you have to set the MIME.
It’s done by casting the IMFByteStream to IMFAttributes and set the MIME. Unfortunately, as of LOVE commit ccf9e63, the decoder constructor no longer receives the audio extensions, so we have to test for every possible supported MIME types. Fortunately, Microsoft gives us list of supported media formats in their documentation, so I just get the MIME string from IIS MIME Types.

After those problems is resolved, it’s only matter of setting the properties of the IMFSourceReader like creating decoder which outputs PCM.

Seeking

Yes, the IMFSourceReader supports seeking, but there’s no guarantee that it will be accurate. The function you’re looking for is IMFSourceReader::SetCurrentPosition which accepts time as 100-nanoseconds (second to 100-nanosecond is multiply by 1e+7).

Another Problem: GUID

I’m getting linker errors when compiling LOVE with the MediaFoundation decoder as it complains about unresolved GUID. I also don’t want to link to any of MediaFoundation DLLs so it’s binary compatible with Windows Vista (XP support is dropped as of LOVE 11.0). Temporary workaround to fix this is to copy the GUID declaration in header files to const GUID variables.

Aftermath

After I get everything running, now I have LOVE build which can loads AAC and WMA using MediaFoundation, how good is that, huh? You can check the full source code in here. The respective header lies in same directory as the C++ file.

Now you may ask, what about MinGW? Well, unfortunately LOVE doesn’t support being compiled under MinGW in the first place, so compiling LOVE under Windows is only supported using MSVC compiler.

And if anyone wants to decode audio from memory using MediaFoundation, then this blog post is what he/she’s looking for.

Post is written in Markdown first then converted to HTML lol.

Wednesday, June 1, 2016

Simple Screen Capture in Windows using FFmpeg

One day, I want to record my FL Studio activity. I can't use FRAPS because I use Windows 8 where capturing Aero Desktop option in FRAPS is useless. I also tried CamStudio but it lags alot. I remembered that I have FFmpeg installed and use that for screen recording instead.

FFmpeg is a command-line application that focus mostly on audio and video related. Although it can be used to re-encode video to another format and it's bit complex, using it for screen recording is actually simple.

Before proceed, here are things you need

FFmpeg. Can be downloaded here
Stereo Mix audio input enabled in the audio devices (optional, only needed if you want to capture the audio also)

Now, extract the FFmpeg somewhere and double-click "ff-prompt.bat". It will add "ffmpeg" to the command prompt for the current CMD window. Now we can start recording.

To record specific window, the input will be title="<the window name>". To record the entire desktop, the input will be desktop.
Now to start recording, just type ffmpeg -f gdigrab -i <the input> capture.mkv and to stop recording, just press Ctrl+C in the CMD Window. The captured screen will be stored in "capture.mkv" file.

If you have stereo mix audio input, you can also record the audio. First we need to get the exact name of the stereo mix input name. Type ffmpeg -list_devices true -f dshow -i dummy 2>&1 | findstr /I "stereo mix". If there's no output, that means you don't have stereo mix. In my laptop, it output something like this

D:\pvid>ffmpeg -list_devices true -f dshow -i dummy 2>&1 | findstr /I "stereo mix"
[dshow @ 0000000001cbd040]  "Stereo Mix (Realtek High Definition Audio)"

"Stereo Mix (Realtek High Definition Audio)" is example of the actual stereo mix input name, at least in my laptop.

Now to record the audio+video, type ffmpeg -f dshow -i audio="<stereo mix name>" -f gdigrab -i <the input> capture.mkv

If you feel that your PC lags alot when capturing the screen or want to capture without losing any quality, you may want to capture the video (and audio if you capture it too) in uncompressed format first, then re-encode it later. FFmpeg also can be used to re-encode the video if you know how.

To record the uncompressed video only, type ffmpeg -f gdigrab -i <the input> -c:v libx264 -qp 0 -preset ultrafast capture.mkv. The size of the video might be large, but not as large as fraps.
For the audio+video, type

ffmpeg -f dshow -i audio="<stereo mix name" -c:a pcm_s6le -f gdigrab -i <the input> -c:v libx264 -qp 0 -preset ultrafast capture.mkv

If possible, try to capture specific window only instead of the whole desktop. Capturing the whole desktop only gives framerate of 30FPS, while capturing specific window gives framerate 60FPS. It doesn't matter if the window size is small or big when capturing specific window, it will gives 60 FPS instead of the whole desktop which gives 30 FPS.