nicdex @nicdex

**Felix Palmen** @zirias@bsd.cafe · Jul 9

Trying to further improve #swad, and as I'm still unhappy with the amount of memory needed ....

Well, this little and special custom #allocator (only dealing with equally sized objects on a single thread and actively preventing #fragmentation) reduces the resident set in my tests by 5 to 10 MiB, compared to "pooling" with linked lists in whatever #FreeBSD's #jemalloc serves me. At least something.

https://github.com/Zirias/poser/blob/master/src/lib/core/objectpool.c

The resident set now stabilizes at 79MiB after many runs of my somewhat heavy jmeter test simulating 1000 distinct clients.

POsix SERvices framework for C. Contribute to Zirias/poser development by creating an account on GitHub.

GitHubposer/src/lib/core/objectpool.c at master · Zirias/poserPOsix SERvices framework for C. Contribute to Zirias/poser development by creating an account on GitHub.

Replied to Felix Palmen

**Ronald Klop** @ronnie_bonkers@mastodon.social · Jun 28

Jun 28

Ronald Klop @ronnie_bonkers@mastodon.social

@zirias I would love to use a package of #swad

Continued thread

**Felix Palmen** @zirias@bsd.cafe · Jun 25

Jun 25

Felix Palmen @zirias@bsd.cafe

Related question, only for people who *have* some need for either authentication or proof-of-work added to their #nginx:

Would you consider #swad if there was a pre-built #package (or port) for your OS? IOW, is building and installing it manually from source an issue?

**Felix Palmen** @zirias@bsd.cafe · Jun 25

Jun 25

Felix Palmen @zirias@bsd.cafe

Please help me spread the link to #swad

https://github.com/Zirias/swad

I really need some users by now, for those two reasons:

* I'm at a point where I fully covered my own needs (the reasons I started coding this), and getting some users is the only way to learn about what other people might need
* The complexity "exploded" after supporting so many OS-specific APIs (like #kqueue, #epoll, #eventfd, #signalfd, #timerfd, #eventports) and several #lockfree implementations based on #atomics while still providing fallbacks for everything that *should* work on any #POSIX systems ... I'm definitely unable at this point to think of every possible edge case and test it. If there are #bugs left (which is somewhat likely), I really need people reporting these to me

Thanks!

Simple Web Authentication Daemon. Contribute to Zirias/swad development by creating an account on GitHub.

GitHubGitHub - Zirias/swad: Simple Web Authentication DaemonSimple Web Authentication Daemon. Contribute to Zirias/swad development by creating an account on GitHub.

**Felix Palmen** @zirias@bsd.cafe · Jun 24

Jun 24

Felix Palmen @zirias@bsd.cafe

Just released: #swad 0.12

swad is the "Simple Web Authentication Daemon". It basically offers adding form + #cookie #authentication to your reverse proxy (designed for and tested with #nginx "auth_request"). I created it mainly to defend against #malicious_bots, so among other credential checker modules for "real" logins, it offers a proof-of-work mechanism for guest logins doing the same #crypto #challenge known from #Anubis.

swad is written in pure #C with minimal dependencies (#zlib, #OpenSSL or compatible, and optionally #PAM), and designed to work on any #POSIX system. It compiles to a small binary (200 - 300 kiB depending on compiler and target platform).

This release brings (among a few bugfixes) improvements to make swad fit for "heavy load" scenarios: There's a new option to balance the load across multiple service worker threads, so all cores can be fully utilized if necessary, and it now keeps lots of transient objects in pools for reuse, which helps to avoid memory fragmentation and ultimately results in lower overall memory consumption.

Read more about it, download the .tar.xz, build and install it .... here:

https://github.com/Zirias/swad

GitHubGitHub - Zirias/swad: Simple Web Authentication DaemonSimple Web Authentication Daemon. Contribute to Zirias/swad development by creating an account on GitHub.

Continued thread

**Felix Palmen** @zirias@bsd.cafe · Jun 24

Jun 24

Felix Palmen @zirias@bsd.cafe

The release of #swad 0.12 is currently delayed for #github #mfa shenanigans.

As I switched to a new mobile phone, I had to reconnect a new "authenticator app" for #otp to the github account (which I could do on a machine that still had a valid session). Now, on the machine I need for uploading the release, github just doesn't accept the generated codes.

I'm seriously considering #selfhosting a #git web frontend. #forgejo comes to mind, should be simple enough. But providing #CI pipelines might be a different beast ...

Continued thread

**Felix Palmen** @zirias@bsd.cafe · Jun 24

Jun 24

Felix Palmen @zirias@bsd.cafe

Decided to look into the thread "fairness issue" before releasing, and came up with this:

https://github.com/Zirias/poser/commit/a82d10420a2e00c8e539d5d1337d2d6bf736c702

I'm actually amazed by the result, didn't expect that:

* The imbalance with one "loser" thread getting a lot more to do is completely gone
* Overall RAM consumption is significantly lower (which makes sense as the per-thread pools holding currently unused "Connection" objects are now balanced as well). Practical result: 88MiB resident set after 1000 simulated clients firing a total of 8 million requests ... twice!
* Throughput got even better, I can now achieve almost 40k req/s (of course only with TLS, name resolving and informative logging all disabled)
* Response times are sanely distributed around very low averages, like less than 25 ms

Ok, about time to prepare the #swad 0.12 release

When distributing accepted connections to service worker threads,
replace the dumb round-robin by a simple load-balancing algorithm:

For each thread, maintain a counter of currently opened connect...

GitHubServer: Add simple load balancing · Zirias/poser@a82d104When distributing accepted connections to service worker threads, replace the dumb round-robin by a simple load-balancing algorithm: For each thread, maintain a counter of currently opened connect...

#C #coding

**Felix Palmen** @zirias@bsd.cafe · Jun 23

Jun 23

Felix Palmen @zirias@bsd.cafe

Today, it's exactly one month since I released #swad 0.11. And I'm slowly closing in on releasing 0.12.

The change to a "multi #reactor" design was massive. It pays off though. On the hardware that could reach a throughput of roughly 1000 requests per second, I can now support over 3000 r/s, and when disabling #TLS, 10 times as much. Most of the time, I spent with "detective work" to find the causes for a variety of crashes, and now I'm quite confident I found them all, at least on #FreeBSD with default options. As 0.11 still has a bug affecting for example the #epoll backend on #Linux, expect to see swad 0.12 released very very soon.

I'm still not perfectly happy with RAM consumption (although that could also be improved by explicitly NOT releasing some objects and reusing them instead), and there are other things that could be improved in the future, e.g. experiment with how to distribute incoming connections to the worker threads, so there's not one "loser" that always gets slowed down massively by all the others. Or design and implement alternative #JWT #signature algorithms besides #HS256 which could enable horizontal scaling via load balancing. Etc. But I think the improvements for now are enough for a release.

Continued thread

**Felix Palmen** @zirias@bsd.cafe · Jun 20

Jun 20

Felix Palmen @zirias@bsd.cafe

Analyzed and THIS is the core part of the fix:

https://github.com/Zirias/poser/commit/b1aa9784955db4712d695d217344cb65bbff6ed6

I'll have to revisit that later, it's probably wasteful as it is right now, I should come up with some better idea to implement async logging later. But it DOES fix the bug I finally identified that could make #swad crash.

Now, just running it, observing it, stress-testing it from time to time ... wish me luck that THIS was indeed the only reason left for crashing in "normal operation".

I already know there's more work to do regarding reload (SIGHUP) with the multi-threaded approach, it's currently unsafe unfortunately, must be fixed before the next release.

Fix async logging with multiple service workers. Enqueueing a thread job
from within a pool thread is not safe any more, because the job must
know its "owning" service worker thread. So, ...

GitHubLog: Fix async logging with service workers · Zirias/poser@b1aa978Fix async logging with multiple service workers. Enqueueing a thread job from within a pool thread is not safe any more, because the job must know its "owning" service worker thread. So, ...

Continued thread

**Felix Palmen** @zirias@bsd.cafe · Jun 20

Jun 20

Felix Palmen @zirias@bsd.cafe

Oh boy, I have a lead! And it's NOT related to #TLS. I finally noticed another pattern: #swad only #crashed when running as a #daemon. The daemonizing wasn't the problem, but the default logging configuration attached to it: "fake async", by letting a #threadpool job do the logging.

Forcing THAT even when running in foreground, I can finally reproduce a crash. And I wouldn't be surprised if that was actually the reason for crashing "pretty quickly" with #LibreSSL (and only rarely with #OpenSSL), I mean, something going rogue in your address space can have the weirdest effects.

Continued thread

**Felix Palmen** @zirias@bsd.cafe · Jun 20

Jun 20

Felix Palmen @zirias@bsd.cafe

For two days straight, I just can't reproduce #swad #crashing with *anything* in place (#clang #sanitizer instrumentation, attached #debugger like #lldb) that could give me the slightest hint what's going wrong.

But it *does* crash when "unobserved". And it looks like this is happening a lot sooner (or, more often?) when using #LibreSSL ... but I also suspect this could be a red herring in the end.

Situation reminds me of my physics teacher back at school, who used to say something in german I just can't ever forget:

"Wer misst, misst Mist."

Feeble attempt in english would be "the one who measures measures crap", it was his humorous way to bring one consequence of #Heisenberg's indeterminacy principle to the point. And indeed, #debugging computer programs always suffers from similar problems...

**Felix Palmen** @zirias@bsd.cafe · Jun 18 *

Jun 18 *

Felix Palmen @zirias@bsd.cafe

I need help. First the question: On #FreeBSD, with all ports built with #LibreSSL, can I somehow use the #clang #thread #sanitizer on a binary actually using LibreSSL and get sane output?

What I now observe debugging #swad:

- A version built with #OpenSSL (from base) doesn't crash. At least I tried very hard, really stressing it with #jmeter, to no avail. Built with LibreSSL, it does crash.
- Less relevant: the OpenSSL version also performs slightly better, but needs almost twice the RAM
- The thread sanitizer finds nothing to complain when built with OpenSSL
- It complains a lot with LibreSSL, but the reports look "fishy", e.g. it seems to intercept some OpenSSL API functions (like SHA384_Final)
- It even complains when running with a single-thread event loop.
- I use a single SSL_CTX per listening socket, creating SSL objects from it per connection ... also with multithreading; according to a few sources, this should be supported and safe.
- I can't imagine doing that on a *single* thread could break with LibreSSL, I mean, this would make SSL_CTX pretty much pointless
- I *could* imagine sharing the SSL_CTX with multiple threads to create their SSL objects from *might* not be safe with LibreSSL, but no idea how to verify as long as the thread sanitizer gives me "delusional" output

Continued thread

**Felix Palmen** @zirias@bsd.cafe · Jun 18

Jun 18

Felix Palmen @zirias@bsd.cafe

Fixed *that* issue by making sure each instance of the Process class has an owning thread, but forks the child on the main thread and receives exit events from there, delegating that info back to the owing thread. Seems to work.

Now, I can still make #swad crash. But no matter what I tried so far, as soon as I build it with both #debugging symbols and the #thread #sanitizer, I just can't reproduce a crash.

Now what?

Continued thread

**Felix Palmen** @zirias@bsd.cafe · Jun 18

Jun 18

Felix Palmen @zirias@bsd.cafe

Yep, there's a second bug. #clang #thread #sanitizer had nothing to complain, and the output from #assert doesn't help much. So, first step: "pimp your assert" --- #FreeBSD, like some other systems, provides functions to collect and print rudimentary stacktraces, use these if available:
https://github.com/Zirias/poser/commit/c45dd56312dd05b6bf02a27bf9e39eb31331f05a

Now I got closer, see screenshot. That's enough to understand, the issue is with the global event firing when a #child #process exits, this was used from multiple threads. Ok, it obviously doesn't work that way, so, back to the drawing board regarding my handling for child processes...

Next #swad release: Soon, so I hope

**Felix Palmen** @zirias@bsd.cafe · Jun 18

Jun 18

Felix Palmen @zirias@bsd.cafe

Finally, a lead what could still cause my development version of #swad to crash!

Ok, this looks really weird, the failed assertion at the bottom means a thread ends up fiddling with an event that's owned by a different thread.

But hey, at least now I have stacktraces of what's happening.

**Felix Palmen** @zirias@bsd.cafe · Jun 16

Jun 16

Felix Palmen @zirias@bsd.cafe

Next #swad release will still be a while.

I *thought* I had the version with multiple #reactor #eventloop threads and quite some #lockfree stuff using #atomics finally crash free. I found that, while #valgrind doesn't help much, #clang's #thread #sanitizer is a very helpful debugging tool.

But I tested without #TLS (to be able to handle "massive load" which seemed necessary to trigger some of the more obscure data races). Also without the credential checkers that use child processes. Now I deployed the current state to my prod environment ... and saw a crash there (only after running a load test).

So, back to debugging. I hope the difference is not #TLS. This just doesn't work (for whatever reason) when enabling the address sanitizer, but I didn't check the thread sanitizer yet...

Continued thread

**Felix Palmen** @zirias@bsd.cafe · Jun 16

Jun 16

Felix Palmen @zirias@bsd.cafe

Feels like every time I try to reduce #memory usage, I accidentally improve #throughput instead. At least THIS time, I also see reduced memory usage, nice!

#swad #coding #c

**Felix Palmen** @zirias@bsd.cafe · Jun 16

Jun 16

Felix Palmen @zirias@bsd.cafe

Slow and steady progress making #swad fit for heavy traffic: Adding thread-specific "object pools" for the connection objects representing the clients of one server seems to have reduced the growth of the resident set! It may sound counter-intuitive to *save* memory by *not* returning any ,,, but that's what I observe And it also improved throughput further!

I'll apply that principle to even more objects.

Meanwhile, silly behavior regarding scheduling increased. For a while, I was observing one service worker thread being twice as busy as all the others ... now that picture was completed by one being especially lazy. What the ....?

Continued thread

**Felix Palmen** @zirias@bsd.cafe · Jun 13

Jun 13

Felix Palmen @zirias@bsd.cafe

The #lockfree command #queue in #poser (for #swad) is finally fixed!

The original algorithm from [MS96] works fine *only* if the "free" function has some "magic" in place to defer freeing the object until no thread holds a reference any more ... and that magic is, well, left as an exercise to the reader.

Doing more research, I found a few suggestions how to do that "magic", including for example #hazardpointers ... but they're known to cause quite some runtime overhead, so not really an option. I decided to implement some "shared object manager" based on the ideas from [WICBS18], which is kind of a "manually triggered garbage collector" in the end. And hey, it works!
https://github.com/Zirias/poser/blob/master/src/lib/core/sharedobj.c

[MS96] https://dl.acm.org/doi/10.1145/248052.248106
[WICBS18] https://www.cs.rochester.edu/u/scott/papers/2018_PPoPP_IBR.pdf

GitHubposer/src/lib/core/sharedobj.c at master · Zirias/poserPOsix SERvices framework for C. Contribute to Zirias/poser development by creating an account on GitHub.

#coding #c #c11

**Felix Palmen** @zirias@bsd.cafe · Jun 11

Jun 11

Felix Palmen @zirias@bsd.cafe

This redesign of #poser (for #swad) to offer a "multi-reactor" (with multiple #threads running each their own event loop) starts to give me severe headaches.

There is *still* a very rare data #race in the #lockfree #queue. I *think* I can spot it in the pseudo code from the paper I used[1], see screenshot. Have a look at lines E7 and E8. Suppose the thread executing this is suspended after E7 for a "very long time". Now, some dequeue operation from some other thread will eventually dequeue whatever "Q->Tail" was pointing to, and then free it after consumption. Our poor thread resumes, checks the pointer already read in E6 for NULL successfully, and then tries a CAS on tail->next in E9, which is unfortunately inside an object that doesn't exist any more .... If the CAS succeeds because at this memory location happens to be "zero" bytes, we corrupted some random other object that might now reside there.

Please tell me whether I have an error in my thinking here. Can it be ....?

Meanwhile, after fixing and improving lots of things, I checked the alternative implementation using #mutexes again, and surprise: Although it's still a bit slower, the difference is now very very small. And it has the clear advantage that it never crashes. I'm seriously considering to drop all the lock-free #atomics stuff again and just go with mutexes.

[1] https://dl.acm.org/doi/10.1145/248052.248106

Recent searches

Search options

Administered by:

Server stats:

#swad