Friday, March 16, 2018

Fixing Image Quality of SIF Cards: RGBA4444 vs. RGB565

Some people discussing about SIF cards image quality being sucks since New Year Kanan UR. They said somehow KLab change the dithering method used in cards and always uses RGBA4444. I argue that RGB565 for clean UR card are better and actually makes sense (clean UR card doesn't need transparency, but "navi"[1] requires transparency).

Here's the original card image, to be used for comparison (the latest Ruby UR card paired with Yoshiko as I'm writing this):

1489u-orig.png

You can notice the grainy pictures, or noise caused by dithering and RGBA4444 color loss. I start to think that something wrong about the dithering algorithm. Here's the version with selective gaussian blur applied (someone in the discussion do the blur):

1489u-sgb.png

Selective gaussian blur does the job looks like, but I said, that waifu2x can achieve something better, but then he doesn't agree and says that waifu2x destroys the edges and some parts of the image. Well, I'm just gonna do it anyway. I tried many different methods, like noise level 1, level 2, noise level 2 then 2x scale then 50% downresize, but I found noise level 1 is already doing the job pretty well and noise level 2 destroys some details as he says before. Here's the image result:

1489u-w2x-noise1.png

Ok it looks good now, so let's apply RGBA4444 conversion and dithering to the noise removed image. Unfortunately, ImageMagick can't do RGBA4444 conversion (and any 16bit RGB color conversion) and specify dithering at same time, so I have to generate the necessary color table (RGBA4444 and RGBA565 for our purpose) and use -remap option in ImageMagick later. Here's the command that I use to generate the color table:

C:\Users\MikuAuahDark\Pictures\1489>magick convert -size 256x256 xc:none -channel R -fx "((i*w+j)&15)/15.0" -channel G -fx "(((i*w+j)>>4)&15)/15.0" -channel B -fx "(((i*w+j)>>8)&15)/15.0" -channel A -fx "(((i*w+j)>>12)&15)/15.0" rgba4444.png
C:\Users\MikuAuahDark\Pictures\1489>magick convert -size 256x256 xc:white -channel R -fx "((i*w+j)&31)/31.0" -channel G -fx "(((i*w+j)>>5)&63)/63.0" -channel B -fx "(((i*w+j)>>11)&31)/31.0" rgb565.png

Now we can tell ImageMagick to use our color table:

C:\Users\MikuAuahDark\Pictures\1489>magick convert 1489u-w2x-noise1.png -dither none -remap rgba4444.png 1489u-w2x-noise1-rgba444none.png
C:\Users\MikuAuahDark\Pictures\1489>magick convert 1489u-w2x-noise1.png -dither riemersma -remap rgba4444.png 1489u-w2x-noise1-rgba444rie.png
C:\Users\MikuAuahDark\Pictures\1489>magick convert 1489u-w2x-noise1.png -dither floydsteinberg -remap rgba4444.png 1489u-w2x-noise1-rgba444fstein.png

And here's the result:

1489u-w2x-noise1-rgba444none.png -dither none 1489u-w2x-noise1-rgba444rie.png -dither riemersma 1489u-w2x-noise1-rgba444fstein.png -dither floydsteinberg

With -dither none, the color difference is very noticeable, while using -dither riemersma or -dither floydsteinberg produces similar result to dithering algorithm that KLab uses (if you look closely, KLab dithering looks better a bit). So it's the RGBA4444 that causing image quality problems. Ok now let's switch to RGB565.

magick convert 1489u-w2x-noise1.png -dither none -remap rgb565.png 1489u-w2x-noise1-rgb565none.png
magick convert 1489u-w2x-noise1.png -dither riemersma -remap rgb565.png 1489u-w2x-noise1-rgb565rie.png
magick convert 1489u-w2x-noise1.png -dither floydsteinberg -remap rgb565.png 1489u-w2x-noise1-rgb565fstein.png

1489u-w2x-noise1-rgb565none.png -dither none 1489u-w2x-noise1-rgb565rie.png -dither riemersma 1489u-w2x-noise1-rgb565fstein.png -dither floydsteinberg

Oh! The RGB565 variant looks way better than ones encoded with RGBA4444. My argument is correct that RGB565 is certainly better for clean UR cards rather than RGBA4444. But the problem is, the RGB565 variant increase twice as RGBA4444 variant and that's just the PNG representation, so let's try to simulate how it's stored in SIF game engine texture bank format and compare their size.

SIF texture bank can store the raw pixel data in variety of different pixel formats[2]. For RGB, there are RGBA5551, RGBA4444, RGB565, and RGBA8888 (RGBA8888 is the usual pixel formats used in images and Photoshop RGB/8). The raw pixel data can be stored uncompressed or compressed with zlib[3], PVR, ETC, or other compression formats supported by the GPU, but most of the time KLab just uses zlib compression, so we use zlib compression. In this example, we picked -dither floydsteinberg with RGB565 and RGBA4444 variant. For sake of tool completeness, I used Linux WSL environment since I don't know how to specify compression level in openssl zlib.

Unfortunately, FFmpeg nor ImageMagick can't output to raw RGBA4444, so I write my own script to do it later in WSL. Here's the Lua script:

#!/usr/bin/env luajit
-- Expected RGBA8888 stdin

local ffi = require("ffi")
local w = assert(tonumber(arg[1]))
local h = assert(tonumber(arg[2]))

local rgba4444 = ffi.new("uint16_t[?]", w*h)
local rgbstruct = ffi.typeof("union {struct {uint8_t r,g,b,a;} rgba; uint8_t raw[4];}")

for i = 1, w*h do
    local d = rgbstruct {raw = io.read(4)}
    -- Remember FFI arrays are 0-based index
    local r = bit.lshift(bit.rshift(d.rgba.r, 4), 12)
    local g = bit.lshift(bit.rshift(d.rgba.g, 4), 8)
    local b = bit.lshift(bit.rshift(d.rgba.b, 4), 4)
    local a = bit.rshift(d.rgba.a, 4)
    rgba4444[i-1] = bit.bor(bit.bor(r, g), bit.bor(b, a))
end

io.write(ffi.string(rgba4444, w*h*ffi.sizeof("uint16_t")))
os.exit(0)

And here's the command I'm using (have to rename WSL bash.exe so it doesn't clash with MSYS/Cygwin bash):

C:\Users\MikuAuahDark\Pictures\1489>env winbash
mikuauahdark@Xyrillia-20166:/mnt/c/Users/MikuAuahDark/Pictures/1489$ ffmpeg -i 1489u-w2x-noise1-rgb565fstein.png -pix_fmt rgb565 -f rawvideo 1489u-w2x-noise1-fstein.rgb565
mikuauahdark@Xyrillia-20166:/mnt/c/Users/MikuAuahDark/Pictures/1489$ ffmpeg -i 1489u-w2x-noise1-rgba444fstein.png -pix_fmt -f rawvideo - | ./rgba4444.lua 512 720 > 1489u-w2x-noise1-fstein.rgba4444
mikuauahdark@Xyrillia-20166:/mnt/c/Users/MikuAuahDark/Pictures/1489$ cat 1489u-w2x-noise1-fstein.rgba4444 | pigz -z -11 > test-1489u-rgba4444.zz
mikuauahdark@Xyrillia-20166:/mnt/c/Users/MikuAuahDark/Pictures/1489$ cat 1489u-w2x-noise1-fstein.rgb565 | pigz -z -11 > test-1489u-rgb565.zz

With just this information, we can estimate the texture bank size. RGBA4444 variant size (compressed) is 245965 bytes, and the RGB565 variant size (compressed) is 399097 bytes. That's 162% size increase. Now let's calculate how much the additional size increase in SIF data if RGB565 is used. We also need to assume average 165% size increase, avg. 500KB of each card file (in RGBA4444), and all clean UR cards are encoded in RGBA4444 (actually, there are already cards that are encoded to RGB565 before, but let's assume all images were encoded to RGBA4444 at the moment to simplify it).

As of writing, School Idol Tomodachi says there are 86 SSR and 234 UR cards. Not all cards have unidolized form and only have idolized variant, but we still can get estimation of such cards from School Idol Tomodachi too, and it says there are no SSR and 79 UR cards with only idolized form[4], which in total there are 561 cards[5]. The cards total size when encoded is:

RGBA4444 Cards = 561 Cards * 500 KB = 274 MB
RGB565 Cards = RGBA4444 Cards (274 MB) * 165% = 452 MB
Size Increase = |274 MB - 452 MB| = 178MB

The size increase is ~178MB. How big is it? BanG Dream Girls Band Party voice size is ~150MB if I remember. Live Simulator: 2 APK size is 23MB, equivalent to 8 Live Simulator: 2 in your Android phone. 178MB is around twice of SIF install data size (the one preloaded into your phone when you started SIF for the first time). Size increase of 178MB should be acceptable, so I think it's gonna worth it if it's re-encoded to RGB565. The problem is that you'll have to download ~452MB of cards if this happends.


The conclusion is: I don't see reason why KLab uses RGBA4444 for clean UR cards except to reduce size, but the quality-size tradeoff is simply unbalanced (as in UNBALANCED LOVE), so KLab should change the pixel format for their clean UR cards from RGBA4444 to RGB565 to increase the image quality, and then use Genki Zenkai dither algorithm with DAY! DAY! DAY! recipe to decrease the size a bit.


[1]: "navi" means the transparent variant of SIF cards, the one used in your home SIF menu. Dunno if KLab is having joke with Avatar, but of course KLab doesn't know anything about the Legend of Aang.

[2]: https://github.com/MikuAuahDark/Itsudemo/blob/master/TEXB_stream_format.txt#L29

[3]: https://github.com/MikuAuahDark/Itsudemo/blob/master/src/TEXBLoad.cpp#L195

[4]: https://schoolido.lu/cards/?search=&rarity=UR&attribute=&is_promo=on and https://schoolido.lu/cards/?search=&rarity=SSR&attribute=&is_promo=on

[5]: IdolizedMultipler*(SSR TotalCard+UR TotalCard)-(SSR PromoCard+UR PromoCard)

I write the post in markdown and converted it to HTML, so I'm sorry if there are problems.

Tuesday, March 13, 2018

Some Q&A Part 1

Someone send me an E-Mail.



how many hours a day you code?
what was the hardes part/feature you have programmed?
who are your best girls muse/aqours and worst?
how long till delete beatmap function and restart option in pause
menu? ༼ ◕_◕ ༽
what are the ideas for the game that you trashed into the bin - were
not implemented but could be cool?
name 3 reasons why should i come to indonesia?

The question looks like mixing alot things, like a personal question, Live Simulator: 2 questions, and Love Live! related question. You may wonder why, but recently I'm opening some kind of AMA related to technical thing about SIF. Here's my tweet mentioned about it (it's also pinned in my Twitter profile). Well, let's answer the question one by one.

how many hours a day you code?
Depends on my mood, time I have, school, and irl things. Most of the time I code 3 hours a day. 7 PM to 10 PM (UTC+8, no DST), but sometimes I code even at 8 hours a day. It all depends on many factors that you may can't think of. In short, my coding time is inconsistent.

 what was the hardes part/feature you have programmed?
 Hardest part? There are some. The first one is this, which turns out to be the base of all SIF WW cheats out there, sad. Second is FFmpeg video loader (FFX) for my Live Simulator: 2. Basically FFX loads any audio/video formats supported by current loaded FFmpeg (libav) libraries and you can use it as video storyboard for your beatmap in Live Simulator: 2. And the last is GUI. Yet, I still can't think a way how to structure GUI codes correctly. I always take too many assumptions when coding it, like optimization, multitouch support, textinput, and more.

who are your best girls muse/aqours and worst?
Myus (that's how I pronounce it. Too sad I can't type greek mu in my laptop) best girl, Honoka Kousaka. For worst girl, probably none, but I found Maki Nishikino is bit annoying (sorry Maki fan).

I prefer love things than hating it, but it's totally different for Aqours. I don't really like Aqours for some reason (and even angry to myself if I accidentally forgot to switch from Aqours mode to Myus mode before exiting SIF). But since you asked it, best girl for Aqours is Riko Sakurauchi, and worst girl is Ruby Kurosawa (sorry Ruby fan), because her voice anoys me a bit.

how long till delete beatmap function and restart option in pause
menu?

Not sure what you mean, but if you mean Live Simulator: 2, then it should arrive in v2.2 :P 

Update: It ariived in 3.0

what are the ideas for the game that you trashed into the bin - were
not implemented but could be cool?

Not sure what game, but if you mean Live Simulator: 2, then probably a full team simulation. This thing is probably arrive in my next project, FiLive! (rewritten Live Simulator: 2 to C++), but probably I'll start FiLive! development once Live Simulator: 2 v3.0 hits, which is in a few years in the future :P

name 3 reasons why should i come to indonesia?

There are plenty of reasons. But of course, not to meet me specially, since I'm introvert. Simple Google search result shows me article from telegraph.co.uk, which lists 15 reasons why you should visit Indonesia. So I'm just pick best reasons: Komodo Island, Borobudur Temple, and the diving to see corals, quoted from the same article, "One of the best ways to explore it is on a liveaboard boat around the Raja Ampat Islands in Indonesia's West Papua province.".

I suggest you to visit Bali first. I also have plans to Bali in June, so if we're somehow meet each other, we'll start Time Lapse by Poppin' Party (the song I'm listening in loop while writing this).

EDIT: Updated FFmpeg video loader link to point to LS2 v2.x

Wednesday, June 1, 2016

Simple Screen Capture in Windows using FFmpeg

 One day, I want to record my FL Studio activity. I can't use FRAPS because I use Windows 8 where capturing Aero Desktop option in FRAPS is useless. I also tried CamStudio but it lags alot. I remembered that I have FFmpeg installed and use that for screen recording instead.

FFmpeg is a command-line application that focus mostly on audio and video related. Although it can be used to re-encode video to another format and it's bit complex, using it for screen recording is actually simple.

Before proceed, here are things you need
  • FFmpeg. Can be downloaded here
  • Stereo Mix audio input enabled in the audio devices (optional, only needed if you want to capture the audio also)
Now, extract the FFmpeg somewhere and double-click "ff-prompt.bat". It will add "ffmpeg" to the command prompt for the current CMD window. Now we can start recording.

To record specific window, the input will be title="<the window name>". To record the entire desktop, the input will be desktop.
Now to start recording, just type ffmpeg -f gdigrab -i <the input> capture.mkv and to stop recording, just press Ctrl+C in the CMD Window. The captured screen will be stored in "capture.mkv" file.

If you have stereo mix audio input, you can also record the audio. First we need to get the exact name of the stereo mix input name. Type ffmpeg -list_devices true -f dshow -i dummy 2>&1 | findstr /I "stereo mix". If there's no output, that means you don't have stereo mix. In my laptop, it output something like this

D:\pvid>ffmpeg -list_devices true -f dshow -i dummy 2>&1 | findstr /I "stereo mix"
[dshow @ 0000000001cbd040]  "Stereo Mix (Realtek High Definition Audio)"


"Stereo Mix (Realtek High Definition Audio)" is example of the actual stereo mix input name, at least in my laptop.

Now to record the audio+video, type ffmpeg -f dshow -i audio="<stereo mix name>" -f gdigrab -i <the input> capture.mkv

If you feel that your PC lags alot when capturing the screen or want to capture without losing any quality, you may want to capture the video (and audio if you capture it too) in uncompressed format first, then re-encode it later. FFmpeg also can be used to re-encode the video if you know how.

To record the uncompressed video only, type ffmpeg -f gdigrab -i <the input> -c:v libx264 -qp 0 -preset ultrafast capture.mkv. The size of the video might be large, but not as large as fraps.
For the audio+video, type ffmpeg -f dshow -i audio="<stereo mix name" -c:a pcm_s6le -f gdigrab -i <the input> -c:v libx264 -qp 0 -preset ultrafast capture.mkv

If possible, try to capture specific window only instead of the whole desktop. Capturing the whole desktop only gives framerate of 30FPS, while capturing specific window gives framerate 60FPS. It doesn't matter if the window size is small or big when capturing specific window, it will gives 60 FPS instead of the whole desktop which gives 30 FPS.

Saturday, January 16, 2016

Sunday, October 4, 2015

Executing x86 machine code from char array in C/C++

The idea is:
  1. create bunch of char array in a file containing the machine code(not as const)
  2. mark the memory as executable with mprotect or VirtualProtect.
  3. declare typedef function with their correct parameters and pointing to the address of our char array
  4. call it
It's just pretty straightforward for a simple function, like a function that do x+1 or a*b, but function calling another function(e.x.: function calling strlen, malloc, etc.) needs the pointer to be calculated at first or the function will call wrong code(undefined berhaviour).

Okay so let's start. I've been make simple function that XOR 7303014 by x. This is the code


#include <stdio.h>
 
int foo_bar(int baz) {
 int foo=7303014;
 return foo^baz;
}
 
int main() {
 int val=1904132;
 int out=foo_bar(val);
 printf("%s",&out);
 return 0;
}


It looks simple right. It only prints string "bar" in the console. Now what we need is the foo_bar function as machine code. Open Hex Editor and open the executable then find the function.

Note: We are using x86 byte code for this. Take care of this if you're try to compile it as 64-bit code.
The highlighted hex is the foo_bar function that we need to get their machine code representation and put them in our new code. This is our char array declaration

char foo_bar[]={0x55, 0x89, 0xE5, 0x83, 0xEC, 0x10, 0xC7, 0x45, 0xFC, 0x66, 0x6F, 0x6F, 0x00, 0x8B, 0x45, 0x08, 0x8B, 0x55, 0xFC, 0x31, 0xD0, 0xC9, 0xC3};

Now we have completed point 1 above, now let's make some code to finish point 2, 3, and 4

typedef int (*foo_bar_t)(int );
 
int main() {
 foo_bar_t foo=(foo_bar_t)(void*)foo_bar;
 DWORD old_protect;
 VirtualProtect(foo,sizeof(foo_bar),PAGE_EXECUTE_READWRITE,&old_protect);
 int out=foo(0);
 printf("%s",&out);
 return 0;
}

Q: Why cast to void* first then cast it to foo_bar_t?
A: Visual Studio doesn't like casting foo_bar_t directly from char* so, cast it to void* first. GCC works fine without cast to void* above.

Ok that's our complete main function. Now let's explain it.

The point 2 is in the VirtualProtect function. VirtualProtect function mark address pointed by foo variable to be executable, readable, and writeable(see PAGE_EXECUTE_READWRITE). Without this call, point 4 will very likely to fail(throws Segmentation Fault/Access Violation)(be sure to include Windows.h)
The point 3 is above the main. The typedef.
Then, point 4 is below point 2. Again, calling the function without setting the memory protection would likely causes your program stop working.

Alright, this is the complete code


#include <stdio.h>
#include <windows.h>
 
typedef int (*foo_bar_t)(int );
 
char foo_bar[]={0x55, 0x89, 0xE5, 0x83, 0xEC, 0x10, 0xC7, 0x45, 0xFC, 0x66, 0x6F, 0x6F, 0x00, 0x8B, 0x45, 0x08, 0x8B, 0x55, 0xFC, 0x31, 0xD0, 0xC9, 0xC3};
 
int main() {
 foo_bar_t foo=(foo_bar_t)(void*)foo_bar;
 DWORD old_protect;
 VirtualProtect(foo,sizeof(foo_bar),PAGE_EXECUTE_READWRITE,&old_protect);
 int out=foo(0);
 printf("%s",&out);
 return 0;
}


Now let's compile it and run it.







Our program run without error. That means we've been successfully run our machine code. That's for today.

Challenge: Compile both complete code above and find out why it prints "bar" and "foo".