I wrote a library to read images in different file formats for MS-DOS a quarter century ago, throw in a couple of years for decoration. The library got upgraded a few times along the way and had support for the PNG format using the libpng as external dependency. This is pretty standard stuff and it is a safe bet as it is the de-facto library for reading and writing files in that format.
One day I looked at the format a bit more closely and noticed that it is actually fairly trivial at it's core: RGB(A) values are preprocessed for each scan to make them compress better. Then compress the scans with zlib deflate algorithm. That's it if don't want to go into the minute details, which are quite many. Writing a simple decoder took no more than a couple of hours, at which point it looked like a fairly great idea to replace the libpng with something simpler. Yeah... the devil's always in the details, which in this case were endless but that's another story.
At first zlib was used, which worked fairly well as it should because libpng is pretty much built on top of it. I also used MINIZ library for a while but eventually settled for libdeflate, which is exceptionally fast deflate implementation. It comes with a downside that there is no progressive decoding: the whole stream has to be decoded at once, which means that segmented png files must be combined into contiguous buffer which is extra processing pass and worse, consumes more memory. This also has the downside that cannot pipeline the decoding: while the next segment is being decompressed the previous ones can't be post-processed.
Prototyping and profiling did reveal however that the raw speed of the libdeflate defeats ALL of these optimizations hands down. The only remaining downside is increased (temporary) memory usage.
A common question which is asked when code like this is presented is "why did you not use X, Y or Z instead of reinventing the wheel." Back then the X, Y or Z probably did not exist yet. So it all comes down to PR, or lack thereof, which is now finally being addressed (so maybe 10 people out of 7+ billion will ever read this, but whatever).
Here comes the hands-on part of this exercise. Feel free to skip, but is a handy reference in case you want to do some testing yourself.
# first: clone, build and install the library
git clone https://github.com/t0rakka/mango.git
cmake .. -DBUILD_SHARED_LIBS=ON
sudo make install
# next: clone and build png example
git clone https://github.com/t0rakka/mango-examples.git
This should give output something like this...
image: 1920 x 1080 (3986 KB)
libpng: 52.0 ms 896.5 ms
lodepng: 136.7 ms 474.2 ms
spng: 36.7 ms 0.0 ms
stb: 47.7 ms 449.3 ms
mango: 31.4 ms 225.1 ms
Now you have a fun benchmarking toy to play around with. Try with different PNG files you might have on your computer!
What's this good for? Aren't the other libraries fast enough? This probably does not change anything if it comes down to it. Sure, it's a little bit faster and might use slightly less energy but who really cares; how many PNG files you really going to decode for it to matter? The biggest saving is in the encoding; that has come in handy once in a while. The main attraction for myself (I am my biggest customer myself) is the API and how it interacts with other components of the library.
The filesystem abstraction is pretty sweet: I have a data/ folder for some app I am developing and it's live. I can freeze it at any time into zip or other custom container format and simply change the root data folder to be data.zip/ instead. If I use a custom container I can do other interesting things but let's talk about those some other time. The last trick in the bag is I can do bin2obj and link the zip or other container file into my executable and access it through memory map so the data is embedded nicely into my app just like that. Since the filesystem abstraction can use a pointer as root, we can embed the container into WIN32 App as a Resource File and get the root pointer to it like that; as long as you have a pointer, you have a frozen filesystem. That's pretty sweet setup to work with: it just works (duh, would be pretty dumb if the expectation was that it didn't).
The coming game consoles (Playstation 5, next XBOX) have zlib and custom compression format decoders in hardware. This arrangement should be fairly easy to adapt to use the hardware decoding engines with the existing abstraction layer as it is quite flexible for that sort of operations.
The upcoming version of DirectX has APIs for hardware accelerated streaming / containers so it wouldn't be completely unrealistic to expect such support to be found from desktop / laptop ecosystem as well in the future.
These are the things that make the image encoding/decoding infrastructure the PNG codec is written on top of exciting to work with even for the coming years.