Hacker News

cozis

Show HN: Hosting my website using my C web server github.com

xmodem9 months ago

> No reverse proxies required!

This is one that has always baffled me. If there's no specific reason that a reverse proxy is helpful, I will often hang an app with an embedded Jetty out on the internet without one. This has never lead to any problems.

Infra or security people will see this and ask why I don't have an nginx instance in front of it. When I ask why I need one, the answers are all hand-wavy security or performance, lacking any specifics. The most specific answer I received once was slow loris, which hasn't been an issue for years.

Is reverse proxying something we've collectively decided to cargo cult, or is there some reason why it's a good idea that applies in the general case that I'm missing?

codegeek9 months ago

For me, Reverse proxy helps me keep my origin server only for 1 purpose: Serve the Application. Everything else, I can handle with Reverse Proxy including TLS Termination, load balancing, URL rewrites, Security (WAF etc) if needed. Separation of duties for me.

Overall, the benefit is that you can keep your origin server protected and only serve relevant traffic. Also, lets say you offer custom domain to your own customers and in that case, you could always swap out the origin server (if needed) without worrying about DNS changes for your customers as they are pointing to the reverse proxy and not your origin server directly.

TZubiri9 months ago

TLS should be done with proxies, yes. The Stunnel approach is Gospel.

Similarly if you start load balancing, you can put some server in the middle yes. But the ideal solution is at the DNS level I think, unless there's some serious compute going on (which a website loading a page from disk is not).

URL rewrites should not be a thing unless you have a clusterfuck, and Security is best accomplished in my experience by removing, rather than by adding.

OptionOfT9 months ago

I've worked at a place where even internal traffic that crosses machines needs to be encrypted.

So Ingress -TLS-> Container (pod).

We implemented LinkerD for this, which runs as a sidecar in the pod. Since the sidecar and the main container communicate on the same machine, this is OK.

dartos9 months ago

I run many server programs on my homelab.

Each is running on a different port, but I want them all accessible publicly from different URLs and I only want to expose port 443 to the internet.

I also want to have TLS autorefresh for each domain.

I need a reverse proxy for the former and caddy does both.

If you’re running a single server and that server does TLS termination then you don’t really need a reverse proxy.

com2kid9 months ago

Every page off of my (static HTML file!) home page[1] is actually a distinct microservice sitting behind a reverse proxy. I can throw some new experiment together, built it with whatever tooling I want, give it a port number, and let nginx route to it.

It removes a lot of friction from "I wonder if making this service is a good idea?" and because I am self hosting I am not tying myself down to any of the "all in one" hosting platforms.

[1] https://www.generativestorytelling.ai/

dartos9 months ago

Microservice maximalism.

tnolet9 months ago

e.g. Virtual hosting as we called it in the Apache days

deadlocked9 months ago

Virtual hosting is only similar in that it allows you to serve content based on the requested FQDN (or, indeed, destination port of the request).

MayeulC9 months ago

You forgot the original need: share a single IPv4 among different services.

If going IPv6-only, the need for a reverse proxy is seriously lowered. You could spin multiple servers up (even on different machines), listening to 443. Have each service handle its certificate renewal, etc.

anamexis9 months ago

> You forgot the original need: share a single IPv4 among different services.

That "original need" is exactly what GP is talking about.

MayeulC9 months ago

Right, indirectly (single port). I was spelling it out.

cybrox9 months ago

For most of my deployments, the performance impact of a reverse proxy is negligible, I have the configs pre-prepared and it allows me to add TLS termination, URL rewrites or other shenanigans without much effort in the future. So for me, it's mostly a habit that has paid out so far.

cbm-vic-209 months ago

IME, using an Nginx or WAF layer lets the "ops people" make changes to the things you mention (TLS config, URL rewrites, etc.) without getting the "app people" involved. There's a bit of "Conway's Law" going on here, depending on the reporting structure and political makeup of the organization.

nickpsecurity9 months ago

My answer applies to a number of types of servers that sit in front of web applications. You asked about security and performance. I’ll give you a few ways that an extra box can help in those areas.

For security, you want a strong OS with this little code as possible in your overall system. Proxy-style apps can be very simple compared to web, application servers. They can filter incoming traffic, validate the input, or even change it to something safer (or faster) to parse. They can also run on OS’s that are harder to attack: OpenBSD; GenodeOS; INTEGRITY-178B. On availability, putting load-balancing, monitoring, and recovery in these systems is often safer since app servers are more likely to crash.

On performance, the first benefit is that the simple, focused app can have a highly-optimized implementation. From there, one can use hardware accelerators (CPU or PCI) to speed up compression or encryption. Also called offloading. The most, cost-effective setup has many commodity servers benefiting from a few, high-cost servers capable of offloading. Some have load-balancing to route incoming traffic to servers able to handle it best to minimize use of costly resources.

So, there’s a few ways that proxy-type servers can help in security and performance.

dartos9 months ago

I don’t really care think there is a general case for all servers.

For the minimal case you don’t need it, but in production (with a single host) it allows for rolling releases, compression, TLS, fast static file serving, potentially A/B testing capabilities.

The layer of indirection between the request and your server can be very useful.

lnenad9 months ago

> but in production (with a single host) it allows for rolling releases

I mean for me this is pretty much already enough of a reason to always put an rp ahead of my apps. It's requires minimal setup, most of the tools are fire and forget so I see no real downsides. But having the ability to just point it somewhere else, or to split traffic across app replicas, is more than enough.

mistrial99 months ago

caching -- google changed the expectations of millions

arielcostas9 months ago

I think people do it out of habit at this time. In many cases it makes sense to handle TLS termination and compression, but in other instances it really is there for no reason.

Proxying is always less-performing than serving directly since you add another layer in between, right? Or am I missing something?

xmodem9 months ago

Jetty implements both TLS and compression, though in environments where I don't already have automated certificate issuance infrastructure in place I have occasionally deployed caddy as a reverse proxy just for the TLS termination.

fny9 months ago

Most web applications are not written in Java. NGINX also allows static assets to be served directly while side-stepping the application server. This is a boon for interpreted languages.

xmodem9 months ago

And that is a perfectly valid performance reason for adding an nginx layer in front. It does not IMO justify it in the general case however.

rollcat9 months ago

I agree with fny's comment, and add that most "application servers" don't bother with things like supporting sendfile(2); e.g. when hosting a Python application, you need to add something like Whitenoise, and integrate it with your application somehow; that's extra development work that is sometimes easier to throw over the fence at the sysadmin (especially since the sysadmin will usually already have that part of their job automated).

I'd also say that there is no such thing as a "general case"; I've launched and/or supported countless (must be hundreds?) of web projects and even the "simple" ones were each a bit of a snowflake.

https://man7.org/linux/man-pages/man2/sendfile.2.html

https://whitenoise.readthedocs.io/

fny9 months ago

But that is the general case. Most web apps are written in interpreted languages like JavaScript which benefit from a reverse proxy. If I remember correctly, NGINX became popular because of Rails.

Maybe in Java-land it’s overused, but everywhere else it makes sense.

zeroCalories9 months ago

Something like nginx will likely perform far better at serving static content and other cacheable requests. Also allows you to run two binaries at once for a rolling update.

xmodem9 months ago

> likely perform far better at serving static content and other cacheable requests.

But at the cost of having a separate build step that deploys your static assets somewhere. Jetty is actually pretty fast - I've built some fairly high-volume internal apps this way.

> Also allows you to run two binaries at once for a rolling update.

You don't necessarily need an extra reverse proxy layer for this, though I will concede in some environments it's probably the easiest way to achieve it.

zeroCalories9 months ago

You don't necessarily need to deploy your static content anywhere, you can just set nginx to cache your content.

Also, most other rolling update solutions will end up being more complex than having a reverse proxy. What do you have in mind that would be simpler? NixOS?

okasaki9 months ago

You're missing vhosts, TLS, caching, logging, and log analysis, access control, rate limiting, custom error messages, metrics, etc.

01HNNWZ0MV43FF9 months ago

At one job, Nginx facilitated blue-green deployments. I would spin up a 2nd app server and have Nginx cut-over to it with <1 second of downtime. If anything went wrong, the rollback plan was to only roll back the Nginx config.

I automated all that with a few scripts that included sanity checks with `nginx -t`. After the update looked good I would shut down the old app server without any time crunch. Only the Nginx config was time-sensitive.

I'm not sure if you can do that without some kind of reverse proxy as an abstraction layer. At least a TCP-level proxy.

And as everyone said, virtual hosting.

MayeulC9 months ago

In theory, you can do even better with no reverse proxy: hand down the open sockets to the new version of your application, zero downtime at all. (Nothing prevents you from having a reverse proxy in front while doing that).

sophacles9 months ago

> Is reverse proxying something we've collectively decided to cargo cult, or is there some reason why it's a good idea that applies in the general case that I'm missing?

It's a matter of risk management. On the one hand is your service that speaks http. Maybe it uses a good library for it, maybe not - but even if the library is good are we sure you used it correctly? Even if you used it correctly, has it been as thoroughly tested and proven as nginx?

On the other hand you have nginx - a deeply understood technology that has served trillions and trillions of web requests, has proven itself resillient against attacks again and again, and has been reviewed with a fine-toothed comb by security engineers deeply for years.

So just from the starting point, your software is riskier. Even if you're the best software engineer who's ever lived, it's a higher risk profile to deploy new unproven software than the one that's been battle tested for decades.

It's also a matter of mitigation - if your software does have a vuln, are you going to notice it? Even if you do notice it, how long til you understand the problem and fix it? What to do in the time between discovery and deploying the fix? On the other hand if there's an nginx vuln, there are almost certainly juicier targets than your software to exploit first, and the bug and the fix are far more likely to be found and deployed long before someone even tries it for your site.

pengaru9 months ago

It's a lot easier to isolate and de-privilege your reverse proxy that needs to do nothing more than speak http/https with the outside world and some local listeners.

The url-specific web servers you're proxying tend to need a whole lot more, at least filesystem access to serve html content, at most program execution like CGIs and interpreters.

Separating these concerns makes a lot of sense, and brings little to no overhead by modern standards.

jasonjayr9 months ago

Reverse proxy allows some operational flexibility:

1) you can share multiple apps or sites with one server listening on port 443/80. 2) You can redirect to another backend on your infrastrcture 3) You can enforce certain login/sso/restrictions 4) You can configure all these things in one place.

Of course, if you don't need all that, then it's somewhat moot.

Klonoar9 months ago

Amusingly, slowloris is still an issue for some Rust (hyper) based servers. There’s been some movement on it lately - and I’m typing this in a free moment, so maybe it’s finally fixed and someone can correct me - but it’s kind of lurking there and throwing Nginx in front of an e.g Axum deploy is still somewhat necessary.

paxys9 months ago

> I will often hang an app with an embedded Jetty out on the internet

So you are using a proxy server, just an embedded one. Most prefer simply prefer not to bundle their application with one.

didip9 months ago

Reverse proxy is the OG sidecar. You get N number of useful functionalities that doesn't need to live in your primary app, for example: TLS cert handling.

mp059 months ago

> Is reverse proxying something we've collectively decided to cargo cult

Yeah, that’s ridiculous. “Cargo culting” is when people imitate processes without understanding the underlying purpose, but reverse proxying is widely used for valid reasons—like security, load balancing, caching, SSL termination, etc. It’s not just mindless mimicry. Dismissing a best practice as “cargo culting” because they don’t understand it is lazy. Just because it’s common doesn’t mean it’s done without purpose. Worst case? You get people following a pretty good practice.

worik9 months ago

> slow loris,

Really? I am curious.

You are not talking of monkeys?

sophacles9 months ago

It's a pretty clever attack: https://en.wikipedia.org/wiki/Slowloris_(computer_security)

rwmj9 months ago

Cool! I also wrote my own C web server (sources linked below) which ran a commercial website for a while. It's amazing how small and light you can make an HTTP/1.1 webserver. The commercial site ran on a machine with 128MB of RAM and 1 CPU (sic) and routinely served a large proportion of schools in the UK with a closed source interactive, web-based chat system. However that was 20 years ago when the internet was a slightly less hostile place.

He mentions bots make great fuzzers, but I think he should also do a bit of actual fuzzing.

http://git.annexia.org/?p=rws.git;a=tree Requires: http://git.annexia.org/?p=c2lib.git;a=tree http://git.annexia.org/?p=pthrlib.git;a=tree

nicoburns9 months ago

Rust is a good choice for webserver that will run in this footprint without having to worry so much about the hostile internet. My website https://blessed.rs runs on a VM with 256mb of RAM because that was the smallest I can find, but it typically uses ~60mb.

kragen9 months ago

this looks much more practical than my own small and lightweight http/1.0 webserver, but i'm guessing that rws is not nearly as small and lightweight: http://canonical.org/~kragen/sw/dev3/server.s http://canonical.org/~kragen/sw/dev3/httpdito-readme

the really surprising thing about that was that when your memory map only has five 4k pages in it, linux gets really fast at forking

rwmj9 months ago

It operated in the real world (of 20 years ago), and supported in-process dlopened modules which is how the web-chat was implemented, so it was somewhat non-trivial.

kragen9 months ago

also, i'm assuming, comet, and thus long-lived connections that were in communication with each other, whereas httpdito spawns off a separate child process for each request and thus can fob off all the memory allocation and i/o multiplexing work onto the kernel

comet was a pretty compelling reason to write your own web server 20 years ago

rwmj9 months ago

Not sure what comet is in this context?

The chat code [I really should upload the code as the company has been dead for at least 10-15 years] worked by browsers holding an infinitely loading frame, so each client held open a connection for several hours. IIRC there was some Javascript that reloaded the connection after a few hours.

To handle 1000s of HTTP connections we had to implement our own fairly lightweight threads. It also had a cool inversion of control where you could write straight through code and it was turned into event-driven callbacks automatically. The webserver couldn't make use of multiple cores, which was lucky because the server had only 1 CPU!

Also used a pool allocator, which is very well suited to server applications.

kragen9 months ago

https://en.wikipedia.org/wiki/Comet_(programming) is browsers holding an infinitely loading frame, so each client held open a connection for several hours. usually we included <script> tags in that infinitely loading frame so the events could do whatever instead of just adding more text somewhere off the screen below the current scroll position. an alternative way to do comet is to close the connection when there's an event and have the client reload the frame

nowadays people use websockets for comet

yeah, protothreads type stuff and pool allocators are great fits for that kind of work

cozisop9 months ago

httpdito looks incredible

kragen9 months ago

glad you like it!

cozisop9 months ago

Hey, the code looks really good! Thanks for sharing. I'll probably go through it a bit later :)

P.S. Love the indentation

cozisop9 months ago

Hello everyone! This is a fun little project I started in my spare time and thought you'd appreciate :)

sim7c009 months ago

I find it an interesting excersize to read through really old bugs and CvE for http servers to see what might affect my code too. and see how to fix it. nic3 going though =) fun to roll this kind of stuff yourself!

yazzku9 months ago

Appreciated indeed. I happened to want to mess around with the C11 concurrency API and write a server of sorts, mostly as a curiosity of how those constructs work out in C coming from C++.

theideaofcoffee9 months ago

Awesome! I used to think (well, I still do) that getting a barebones service up and running using the system APIs at the lowest level like this is so satisfying. It's sort of magical, really. And to see it serve real traffic! I'm kind of surprised that the vanilla poll() can put up numbers like you were seeing, but I guess it's been a while since I've had to do anything event related/benchmark at that level.

I love the connection-specific functions and related structs and arrays for your connection bookkeeping, as well as the poll fd arrays. It's very reminiscent of how it's done in lots of other open source packages known for high throughput numbers, like nginx, redis, memcached.

Great work!

yard20109 months ago

Working with c/cpp in uni exploded my mind. It's such a specific humbling experience that has a bit of anything I love - engineering, history, culture, linguistics, etc.

It made me think that anyone should know and try every possible language (programming or otherwise) - "thinking" in a language is such a unique experience. The different contexts make everything feel different, even though it's more of the same. The perspective change, and changes the subjective experience.

For example - to really understand the nature of linux or git, you have to speak its language and understand the nuances that are usually lost in translation. Tangibly, to understand the true subjective meaning of the word "forest" in russian one has to speak and understand russian.

The context changes the perspective, so sometimes it changes everything.

ryandrake9 months ago

It’s kind of sad how C has gotten the reputation as this dangerous and scary dark art that only wizards can successfully wield. C was my first love, it’s what we used throughout university, it’s what our operating systems and basic tools are all written in... If you go to your favorite language and step down into the actual implementation of, for example, your network calls, you’re eventually going to get to poll() and write() written in C. It’s useful to know and be fluent in regardless of whether you intend to work on large projects in C.

01HNNWZ0MV43FF9 months ago

But if the dy/dx gradient is that experts can develop faster in safe languages, and novices make fewer mistakes in safe languages, then C isn't useful day-to-day.

It occupies an ever-shrinking ecological niche on the Pareto frontier.

pdp11ty9 months ago

Some of the worst software I've ever used, and also some of the worst software I've ever seen developed, was done by novices in safe languages. You can't escape how the computer works, you can only plug your ears and yell "LALALALALA!" really loud. But that doesn't change reality. If you aren't a good developer, you won't make good software, in any language. That's not the language's fault. If you don't understand pointers, that's on you. Computers use indirection; it's a fact of the craft. It doesn't matter if your fancy runtime hides them from you, they're still in there, and you should know how they work; not only because they're simply important, but because they'll make it easier for you to reason about things when something goes wrong. Otherwise, you'll sit there helpless and come running to someone like me with screenshots of stack traces that tell you exactly what's wrong. (Yes, this happens to me all the time.)

zppln9 months ago

What are you on about? C is more useful day-to-day than the vast majority of languages. Learning it is hardly a waste of time.

the_gorilla9 months ago

C is one of the worst designed programming languages still in use. It's a ridiculous, cruel joke on anyone looking to learn unless your actual goal is to learn what a programming language designed 70s computers looks like.

tuveson9 months ago

I think C is a simple well-designed systems language. It has some warts, but many of the things people complain about are matters of preference – or due to a lack of understanding of the problems that C is good at solving.

The only major challengers to C in the last 50 years are C++ and Rust. I think that’s a testament to the quality of the language.

zppln9 months ago

[flagged]

theideaofcoffee9 months ago

Same, it was my first language that I got real fluent in. And I feel the same when the prevailing sentiment now is that you're 100% guaranteed to shoot your foot off and make your dog sick if you even look at some C code. I think it's harmful, because wielded responsibly it's super powerful. We shouldn't be discouraging something because it's hard to master, we should be encouraging discretion. And that discretion may take you to a memory-safe language, you may stick with C or something similarly low-level, it all depends.

ggliv9 months ago

This is a neat perspective. I’ve heard conversation on how working with different programming languages affects how you code (“learn Haskell, it’ll make you think more functionally!”) but for some reason I never connected it to the linguistic side of things.

I remember learning about the effects of language on cognition in a psychology course I took a while ago, it’s interesting to think about how that could apply more broadly.

cozisop9 months ago

> I used to think (well, I still do) that getting a barebones service up and running using the system APIs at the lowest level like this is so satisfying. It's sort of magical, really

Totally agree. And actually using them is even more satisfying. I'm starting to get curious about email protocols..

> I'm kind of surprised that the vanilla poll() can put up numbers like you were seeing

Me too. I assumed I was going to go with epoll at some point, but poll() is working great.

pdp11ty9 months ago

People seem to forget that all of their amazing, wonderful abstractions are, at their core, doing exactly this: opening sockets, reading from them, writing to them, etc. There is nothing new under the sun.

litbear20229 months ago

You may be interested in this https://news.ycombinator.com/item?id=27431910

> As of 2024, the althttpd instance for sqlite.org answers more than 500,000 HTTP requests per day (about 5 or 6 per second) delivering about 200GB of content per day (about 18 megabits/second) on a $40/month Linode. The load average on this machine normally stays around 0.5. About 19% of the HTTP requests are CGI to various Fossil source-code repositories.

cozisop9 months ago

This post was of great inspiration! It made me realize something like this was doable

petee9 months ago

Aside, if you want to write C apps but aren't comfortable writing the public facing parts, 'Kore' is a great framework with some handy builtins like ACME cert management, Pgsql, curl, websockets, etc.

Essentially build and run modules, and they can be combined (including mixing Lua/Python + C.)

https://kore.io/

greenavocado9 months ago

Finally a website that doesn't crash when it shows up on the front page

afavour9 months ago

Any site with a CDN in front of it can do that.

Don’t get me wrong this is an awesome project but if you really care about this kind of thing in a production scenario and you’re serving mostly static content… just use a CDN. It’ll pretty much always outperform just about anything you write. It’s just boring.

chrismorgan9 months ago

Even caching is normally unnecessary.

Honestly, HN front page traffic isn’t much. For most, it probably peaks at about one page load¹ per second², and if your web server software can’t cope with that, it’s bad.

Even if your site uses PHP and MySQL and queries the database to handle every request, hopefully static resources bypass all that and are served straight from disk. CPU and memory usage will be negligible, and a 100Mbps uplink will handle it all easily. So then, hopefully you’re only left with one request that’s actually doing database work, and if it can’t answer in one whole, entire second, it’s bad.

(I’m talking about general web pages here, not web apps, which have a somewhat different balance; but still for most things HN traffic shouldn’t cause a sweat, even if you’ve completely ignored caching.)

Seriously, a not-too-awful WordPress installation on a Raspberry Pi could probably cope with HN traffic.

—⁂—

¹ Note this metric: page loads, not requests. Requests per second will scale with first-party requests per page.

² From a quick search, two sources from this year: https://marcotm.com/articles/stats-of-being-on-the-hacker-ne..., https://harrisonbroadbent.com/blog/hacker-news-traffic-spike.... Both use JS tracking, but even doubling the number to generously account for we sensible people who use content blockers has the hourly average under one load per second.

re-thc9 months ago

> and if your web server software can’t cope with that, it’s bad.

Well then sites on average are sadly "bad" by your standards. Lots of sites that get on the front page of HN go down.

chrismorgan9 months ago

There are a lot of bad sites, but it’s nowhere near average—it’s a small fraction that are bad in these ways. I visit many sites from HN, and encounter pages that are down or even struggling due to overtraffic significantly less than once a week. Admittedly most of the pages loaded are on well-established sites or static hosts, but there are plenty that are WordPress or similar.

tazjin9 months ago

> Any site with a CDN in front of it can do that.

You are vastly overestimating HN front page traffic. Any reasonable system on any reasonable machine with any reasonable link can do this. And I really do mean reasonable: I've served front-page traffic from a dedicated server in a DC, and from a small NUC in a closet at home, and both handled it completely fine.

theideaofcoffee9 months ago

This sort of trivializes the effort and the fun of a project like this, doesn't it? Yes, you'll want to put all of your ducks in a row when you go to full production and you've reached full virality and your project is taking 5 million RPS globally and offloading all of that onto a CDN and making sure your clients requests are well respected in terms of cache control and making it secure and putting requests through a waf and and and and and. Yes we know. Lighten up. The comment you're replying to was meant to be lighthearted.

kqr9 months ago

Any site that consists of static files served by a professional-grade web server like nginx on a small VPS can also trivially do that.

interroboink9 months ago

If you're hosting static data, shouldn't HTTP cache flags be enough in most cases? Read-only cacheable data shouldn't be toppling even a modest server. Even without an explicit CDN, various nodes along the chain will be caching it.

(though I confess it's been some years since I've worked in this area)

christina979 months ago

That’s not the case these days. Due to TLS, there is very little catching in between you and the server you’re hitting.

eqvinox9 months ago

There are no nodes between you and that server.

nicoburns9 months ago

Pretty much anything that isn't Wordpress is ok these days I think.

rubyn00bie9 months ago

Uhh… doesn’t the link go to GitHub? I’m a little confused by this comment. I mean the project is neat and cool. But I imagine most folks go to GitHub and don’t go to the link showing the webpage. Am I missing something?

wilkystyle9 months ago

Link to the actual site is at the top of the GitHub page.

seumars9 months ago

>I enjoy making my own tools and I'm a bit tired of hearing that everything needs to be "battle-tested." So what it will crash? Bugs can be fixed :^)

I love it

knowitnone9 months ago

[flagged]

tptacek9 months ago

Be respectful. Anyone sharing work is making a contribution, however modest.

https://news.ycombinator.com/showhn.html

SPascareli139 months ago

Only 3.4k of C code for a full http and https server? I honestly thought you would need a lot more for it to be fully compliant with the spec.

ironhaven9 months ago

Http/1.1 is dead simple if you ignore most of the spec. If you only take get requests and set content-length on response you will be good for 99% of user agents. It’s not much more code to handle the transfer-encoding and byte-range headers. HTTPS is just http over a tls socket which is the level of abstraction you should have if you don’t roll your own crypto.

It’s fun and not that bad really.

AnotherGoodName9 months ago

Yeah I’ve done this for embedded devices. A website can be presented with nothing more than a raw socket and sending back a text string of http headers and html in a single text string when people connect to it.

Hell if you’re really lazy you can forgo responding with the http headers and just socket.write(“hello world”) as the response and all the major browsers will render “hello world” to the user. Properly formatted http headers are just a text string extra and the html is just text. There’s not much to it.

folmar9 months ago

And TLS can be handle by kernel if you target linux only. https://docs.kernel.org/networking/tls.html

sph9 months ago

Why HTTP/1.1?

Everybody speaks HTTP/1.0 and it is even simpler.

matja9 months ago

Lack of IP(v4) addresses. HTTP/1.0 sends no Host header, so cannot implement name-based virtual hosts. HTTP/1.1 does.

ninjin9 months ago

It feels about right to me. OpenBSD's httpd(8) [1] currently clocks in at just below 15,000 lines when you include its documentation. Take away a few features, make a few assumptions , and I would not be surprised we are in the 5,000 lines territory like this project.

    $ wc -l *
        31 Makefile
       910 config.c
       314 control.c
        34 css.h.in
       257 http.h
       100 httpd.8
      1262 httpd.c
       882 httpd.conf.5
       843 httpd.h
        19 js.h.in
       218 log.c
       319 logger.c
      2563 parse.y
       309 patterns.7
       713 patterns.c
        46 patterns.h
       829 proc.c
      1484 server.c
       849 server_fcgi.c
       826 server_file.c
      1997 server_http.c
        10 toheader.sed
     14815 total

[1]: https://man.openbsd.org/httpd.8

cozzyd9 months ago

I wrote a simple embedded C webserver to provide a liveview of data acquisition for one of my experiments that weighs in at <250LOC. Ok, I wouldn't put it on the public internet, and it only implements a small fraction of HTTP/1.1, but it works and only requires mallocing at initialization...

rwmj9 months ago

If you control the client, you can make webservers that are very small indeed. Here's one we use for local testing, where we know the client will be libcurl and know exactly what requests will be made: https://gitlab.com/nbdkit/nbdkit/-/blob/master/tests/web-ser... Basically 600 LoC. It would be completely insecure if exposed to the internet, but (by design) it can only serve over Unix domain sockets.

johnisgood9 months ago

Neat!

fanf29 months ago

There are a few other HTTP/1.1 servers at that kind of size https://www.acme.com/software/thttpd/benchmarks.html

panzi9 months ago

Reminds me of that Chaos Communication Congress talk about a blog/web server written in C, but with a bunch of security features (immutable storage, dropped privileges, blog has no access to TLS certificate, etc.): https://www.youtube.com/watch?v=TaE28fJVPTk

kopirgan9 months ago

Like this sort of approach.. Go back to basics and use what's strictly required. Remember McNealy (?) once said you can choose dozen different shapes Microsoft word uses to highlight spelling errors or something to that effect.

There's lots of bloat in practically every software not sure how much it affects performance but it's nice to build something from scratch.

Congrats to developer

Ono-Sendai9 months ago

My blog (https://forwardscattering.org/) uses a C++ web server too: https://github.com/Ono-Sendai/blog

marcodiego9 months ago

How about embedding the contents of the HTML files so that no access to the filesystem is required?

That would make it not only faster but also safer.

kevin_thibedeau9 months ago

I recommend linking a romfs image into the program. It's a simple format and provides an easy way to manage a collection of resources.

knowitnone9 months ago

does that mean recompile every time them HTML is changed? No thanks :)

TZubiri9 months ago

A nice intermediate I use is baking the paths into the source code, so that I only recompile when I add files, but I can hot-swap contents without even restarting the server.

Although if you start caching contents in memory (which is faster) you would have to at least kill the server and restart it. Or signal a reload.

remram9 months ago

Seems like the worst of both worlds. You need to recompile for content changes, and you need to distribute multiple files.

adamrezich9 months ago

Very cool! I was working on something similar at one point, but I sort of gave up on it when I wanted to move it from the "toy server that works on localhost" stage to something that I could actually deploy in the wild. I got overwhelmed by decision paralysis for how to proceed: should I just use a reverse proxy? Or should I rewrite my backend code to be some kind of plugin for some existing server software? If so, what kind of plugin, and for which software?

It's very inspirational to see that you've just said screw it, I'm going to host my own HTTPS server, and also hey reddit, do your worst, try to break it. Now I want to work on my similar project again.

For anyone similarly inspired, but who doesn't know where to begin making an HTTP server, check out this excellent tutorial that walks you through everything you need to make an HTTP/1.0 server, and then grow it to handle HTTP/1.1: https://www2.cs.uh.edu/~gnawali/courses/cosc6377-f12/p1/http...

TZubiri9 months ago

Nice. I've done this in the past. But I feel like attempting to make a file serving http server is like adding preservants and high fructose corn syrup to home made baked goods.

You have the opportunity to really make something custom and of high quality, hard code the paths of your files and avoid a whole class of vulnerabilities for example.

Configuration files? That makes sense when programmer and sysadmin are distinct, you can just modify variables and recompile.

iveqy9 months ago

I think you'll like dwm and other suckless tools. They have configuration as code and require a recompile after a configuration change.

jagged-chisel9 months ago

Not sure if serious…

heyoni9 months ago

Not the only time it’s been brought up in this thread: https://news.ycombinator.com/item?id=41643198

I’m waiting for someone to chime in and explain why that would be a bad idea cause I can’t think of it from a security perspective.

its-summertime9 months ago

Once at a certain level of complexity, e.g. having several hundred/thousand resources, then you start automating your hardcoded paths, and then you still can get bitten.

vs just putting things in a subfolder of your repo or whatever and having the default handling not accept `..` path components

TZubiri9 months ago

But OP isn't reaching that certain level of complexity, doesn't have thousands of resources, he is hosting his own website.

sabas1239 months ago

From a security perspective a lot of changes to this world would be an upgrade. However implementing security features is always a trade off, and sometimes good security is just not worth the loss of other things.

TZubiri9 months ago

My favourite phenomenon is when (computer) security gets in the way of (actual) security.

For example, you implement a super secure solution and no one hacks your website, but you end up being very unproductive and can't find a job. You lost food security.

In covid, bank systems in my country were so hard to use, there were like 6 passwords to login. Not only was usability compromised in the sense of security, but people, especially old people, started lining up in banks, compromising health security.

To say nothing of the scenarios were users just bypass obnoxious exaggerated security systems, like leaving a post-it note with a password on their screens.

gonzus9 months ago

Kudos for your project -- it is great fun and a learning experience to implement your own HTTP server in a low(er)-level language.

One question: you say that "Transfer-Encoding: Chunked responds with 411 Length Required, prompting the client to resend with Content-Length". Is there a reason for doing this (security perhaps), or is it just a choice?

gonzus9 months ago

Sorry for answering myself. I paid more attention now, and it seems this is disabling chunked transfer encoding from the client to the server, which makes sense from a security / reliability PoV. Disabling it from server to client does not (IMHO).

xyst9 months ago

looks like it’s survived the HN front page hug. Congrats.

system7rocks9 months ago

This is amazing. Seriously, more things should be custom-coded. Why not?

bosch_mind9 months ago

For fun, sure. Small mistake can be big security nightmare

whiterknight9 months ago

1000 lines are easier to secure than 5 million lines

agentultra9 months ago

“You can write software that has no obvious bugs or you can write software that obviously has no bugs.”

I think that was ewd?

naniwaduni9 months ago

You can, of course, also write programs that have known bugs. Or even programs that have bugs that obviously shouldn't be there, but are anyway.

victorbjorklund9 months ago

Not if 1000 lines are written by you alone and not checked by anyone else vs 5 million lines of code written by thousands of people and checked by countless more. Linux is probably more secure than 1000 lines of C code from a junior developer.

whiterknight9 months ago

I think this is vastly overrated:

- how much code actually gets read outside of top 2-3 projects?

- how many of those readers can detect security problems?

- why are others inherently better at detecting problems than the author?

Wouldn’t 1000 lines read by 2 people be better than a million read by 10?

mplewis9 months ago

Not if you’re the only author!

a21289 months ago

For a blog? If you don't put anything important on the server itself I can't imagine a hacker could do much. Maybe put a nasty image on your front page, or put their Bitcoin address pretending it's the place to send donations, but it would take a lot of time and effort to remain hidden for hardly any gain.

knowitnone9 months ago

or take over your server?

remram9 months ago

Unless your server has very unusual features, or there are VERY serious kernel vulnerabilities, all an attacker can do is read files accessible to the server's user or run code as the server's user.

And possibly serve attacker-controller content to other users.

jpc09 months ago

> No Transfer-Encoding: Chunked (responds with 411 Length Required, prompting the client to resend with Content-Length

I've always wanted to undertake a project similar to this but chunked encoding has always been the thing that put me off the idea... I never even though about just not supporting that :)

I've written many http/1.1 servers in the past but only for internal stuff that I also controlled the clients. Guess perfection was the enemy of good for me.

remram9 months ago

Chunked encoding is pretty easy no? Just write the full size and \r\n, you can send as one chunk.

It does mean you have to read the client's headers to see if it was requested, though.

jpc09 months ago

This is chunked encoding from the client...

chairmansteve9 months ago

I did something similar in LabView once. There were reasons.....

p0w3n3d9 months ago

I like the string handling, especially

  #define LIT(S) ((string) {.data=(S), .size=sizeof(S)-1})
  #define STR(S) ((string) {.data=(S), .size=strlen(S)})

p0w3n3d9 months ago

I wonder how small the hosting machine can get btw. 8 bit atari seems to small (76 kb of compiled code on my arm64, but it wouldn't get much smaller), however some atmega would suffice I guess

brennopost9 months ago

Making a HTTP/1.1 server is so fun and teaches so much about networking. I highly recommend anyone interested in networking or web development give it a try.

danpalmer9 months ago

> Show HN: Hosting my website using my own C web server

"But if you actually do this, WAT" – https://www.destroyallsoftware.com/talks/wat

As with much of HN, this is fun, a good thing to learn while making and reading about... but it likely needs the caveat that doing this is production isn't a good idea (although in this case the author does not appear to encourage production usage).

dailykoder9 months ago

I'd assume most people would know that? But if they still put random code that someone wrote just for fun into a (serious) production system, then WAT.

Edit: And sure, if the author is lucky, then maybe a handful of people will gather around the code and try to make it "production ready". But since the README doesn't say anything about the topic at all, just let people have fun and learn things along the way?

x3haloed9 months ago

It’s a great way to get hacked

[deleted]9 months agocollapsed

Turboblack9 months ago

some time ago (about 20 years)

I used smallhttpserv - a program that weighs a couple of dozen kilobytes and works even in early Windows

surprisingly, it still works

v3ss0n9 months ago

Nginx is C web server.

nineteen9999 months ago

So is Apache and OpenBSD httpd and probably too many others to name. Node.js is written in C/C++ as is Litespeed, probably Cloudflare Server as well. Microsoft IIS is written in C++.

So that accounts for about the top 5 ...

v3ss0n9 months ago

Heh ,Then python and php are included as C web servers too

synergy209 months ago

I use lighttpd which is lighter and simpler than nginx

ezekielmudd9 months ago

I love it!

It’s fast!

I have always wanted to try out something like this.

Good job!

broknbottle9 months ago

Nice, now lets see Paul Allen's web server.

ifail_for_fun9 months ago

cool project, but the readme has a disingenuous comparison bench against nginx. why even put it there?

cynicalsecurity9 months ago

Why? How is this better than running nginx or Apache2?

rauli_9 months ago

Sometimes it's just fun.

cromulent9 months ago

Great project. Down for me.

$ curl http://playin.coz.is/index.html

curl: (7) Failed to connect to playin.coz.is port 80 after 166 ms: Couldn't connect to server

justmarc9 months ago

It's a fantastic way to make a random, newly written web server in C safe and secure.

arethuza9 months ago

That exact command line worked for me - might there be something on your end blocking outgoing plain HTTP requests?

cozisop9 months ago

Hey, just checked. Server didn't crash. I wonder what happened?

kristianpaul9 months ago

Not to compare but i realice this is something you can do with rust with few lines

https://github.com/actix/actix-web/tree/master/actix-http

theideaofcoffee9 months ago

Look ma, I can do it in python!

$ python3 -m http.server

Alifatisk9 months ago

Or Ruby

$ ruby -run -e httpd .

ustad9 months ago

You call that a few lines of code!?

cozisop9 months ago

it's just a few lines because you're hiding the other ones

p0w3n3d9 months ago

but not in 76 KB

source