sliced lemons

Wednesday, November 29, 2006

Web infrastructure 2.0

The definitive guide for robust websites and email

since I'm working with web infrastructure, people often ask me for help when they run into trouble with their websites. Problems with domain name transfers, email, website downtimes etc.

One of the main reasons for all these issues is the fact that they are entrusting all components to ONE company. So if the webhosting company has a downtime, you are also unable to get your emails etc.

Most of the webhosting companies offer the all-inclusive package, domain name, email, lots of storage and traffic and all kinds of features for dynamic content. This is a nice idea, but if you want a robust solution, my advice is: split things up! Use services that focus on ONE feature. They are best at that and since it is their core product, they will do everything to keep it up and constantly improve it. And you're flexible to change the provider for each component if you're not satisfied anymore.

1. Use a DNS provider for your domain name

I really don't get it. And I can't count how many times I read about problems coming from the fact that people have their domain name with the same company they host their website with. Let alone the trouble and the unpredicted downtime window you have when you want to transfer a domain from one provider to another. I have tried many webhosting companies through the years, when the plan included a domain name, I chose one I don't care about (I call it "hosting domain"). But I don't allow hosting companies to handle my domain name. There are several DNS providers on the market where you can register your domain name and you have full control over the DNS records, and then I can point my domain name to the webhost of my choice. More about that later.

2. Keep email separate from web servers

I have often smiled about people who owned their own domain names but still used freemail (Yahoo, Gmail etc.) as their contact address. Since I know how troublesome setting up email can be, I tend to understand them better. I've seen a lot of systems where emails are handled on the same server instance as the websites (bad idea), and at some where incoming emails were written to MySQL databases (I passionately hate MySQL from an admin perspective) and people would eventually read it with crappy interfaces. Where these scenario would mean you can't get email when your webhost is down.

Some DNS providers also offer email forwarding, so you could have all incoming email sent to Yahoo! Mail or Gmail, and profit from their spam / filter rules / folders / tags etc. Both providers allow you to choose an alternate sender address, so you when you compose mails they will appear to be from your domain. Another option is pointing the MX records of your domain to a box that JUST handles email.

Google offers a hosted solution which works just like that, and it is free.

3. buy a rock solid webhosting package or dedicated server (or more than one)

Choosing a good host is hard, choosing the perfect host is probably impossible. The quality of service changes constantly. I sometimes compare that with restaurants, an article in the paper praises the excellent cuisine of a small cafe, and quite soon the place is very much different from how it was before: more tables, delays, waitresses that can't handle the workload, a manager who suddenly wants to try new concepts with the food they offer.

For me good webhosting has an uptime over 99% and comes with shell access (also used for secure copy. Forget FTP.) and a dedicated IP address, or at least a decent system for hosting domains that are registered somewhere else. If you have the money for a dedicated server, an IP and virtual hosting is already included by design.

Since the domain is registered somewhere else (see 1.) you have full flexibility and can simply set an A record or a CNAME in the DNS to point to the webhosting. If you have a second webhosting that you bought with another company, you can point your domain to this one when the other one is down. That can be done within five minutes and you don't need help from anyone.

You're also free to point www1.yourdomain to host A and www2.yourdomain to host B and write a tiny loadbalancer at host C that knows which hosts are up and can shift traffic to them accordingly. This solution also allows you to plugin additional servers (e.g. the Amazon EC2 grid) when you expect an increase of visitors for a roadshow or TV commercial.

And when you're not happy anymore with the webhosting company (which happens often, unfortunately), you simply set up things at the new location, then change your DNS at a date and time of your choice and you're done.

4. Host static files somewhere else

Again, split things up! Host static files like images, videos, css, flash somewhere else and let your webserver focus on delivering your website. I use Amazon S3 for that, and I added a CNAME to my DNS to have a hostname like media.mydomain.com, you could also use flickr (read the TOS before) or imageshack for hosting images if you like.

5. Cache as much dynamic content as you can

I think lots of problems result from the fact that many websites are run by people who have only a vague idea about webservers and DNS, and they think everything is about php and mysql, with some fancy css around it. If we leave highly personalized websites (web mail, content management systems etc.) out of here, I generally don't like the idea that a system makes requests to (mysql) databases because a user just CLICKED somewhere.

Cache your stuff as much as you can, and if you have to use one of those fancy blog / cms software, enable caching
use web stress tools to find out how many users are able to surf on your website at the same time. Check if you are happy with the result
in large scenarios have a middleware server that produces static html files and pushes it to the www box
use content distribution networks like Amazon S3 or akamai to host your static pages and avoid a server downtime (because you are on the frontpage of digg)
databases, especially mysql are not something super cool that a website can't live without. You can use a webservice (like Amazon S3, again) to store and get your data with REST, SOAP or JSON requests. Depending on your scenario, even the Unix filesystem (serialized data) can be a good choice to speed things up and avoid another point of failure

Well, that's it! Sure, there is much more to it and I didn't walk you step by step through things. I just wanted to give you some hints. One last thing are backups. I often realize how often this issue is ignored and how many websites never go online again after a downtime, because there were no (useful) backups.

Labels: amazonaws, architecture, diggeffect, dns, downtime, dreamhost, ec2, email, hosting, infrastructure, mediatemple, s3, uptime, web-20

Monday, January 02, 2006

The zero factor in Google's search algorithm

search results in Google are calculated with an ultra complex formula, which includes metrices like pagerank, backlinks and many other things. Even the fact if one hostname shares its ip address with many others matters.

What if you wanted to have a list of all domains sorted by their importance, no matter for what keyword? Simply neutralize the search term factor by using a null keyword value.

Try searching for http in Google, at this moment it has 12,660,000,000 results.

The winners of the Google algo seem to be Microsoft, the W3C, Altavista, Yahoo and CNN.

Update Nov 2007: this doesn't work anymore - the keyword "http" is now a valid search string

Tags:

Tuesday, November 29, 2005

Fight comment spam, earn cash!

You have certainly seen comments on websites or blogs like these before:

"I found your website by accident and really like it. Please also visit my site, where you can get free [anything], please click here http://mycoolwebsite.com/..."

Many people already got used to deleting spam comments from their blogs or even totally disabled the comment feature. This weekend I read about an approach from a german blogger which I'd like to share with the english-speaking communuity.

One year ago, he added the following line just before the SUBMIT button in the comment area:

"Dear spammers, advertising comments will cost you € 1400 a month".

Wow... he was probably inspired by the can deposit system in Germany. They don't want you to buy Coke in cans, so there is an extra charge of 25 cent per can, that you get back if you return the can. Almost nobody does it, so now Coke cans have almost disappeared from the shops. You change things if it pays off to people or if they have expenses if they act inapropriate. Martin (the german blogger) didn't intend to make money from it, he just wanted to have clean blog comments. For each spam comment he sent a bill to the owner of the linked website, and suddenly a bunch of scared spammers let him know through their lawyers that it wasn't them who posted the comment etc. So he scared away some of the spammers, some at least had to pay their lawyers and some simply apologized and promised not to spam his blog anymore.

Now after one year Martin removed the line from the comment box. I translated his post for you:

reason ONE: too many people copied my idea without thinking about it. Some even contacted me and asked for tips and legal advice on how to they could scare spammers out of fun. I wanted to make the world better, not worse.

reason TWO - and this one is the important one: I got to know a lot of spammers. They can be divided into two categories:

the industrial spammers create hundreds of automated comments in a second from different IP addresses, sometimes hundreds of comments per day. They are mostly located overseas, my bills would have been useless.

professional spammers are smart ass mafiosi, act alone but are part of a network and post comments by hand supported by half automatic scripts. If they are located in the local market (Germany), the bills worked (but they knew their rights and sent letters of objection). Most of all the line itself ("you have to pay for spam comments") worked and scared away many. We need to fight those kind of people, unfortunately they almost don't show up at my site anymore.

the self made man spammers. I can't explain the reason, only describe what I'm seeing: more and more I get to know people that run their own websites and try to make money from it, often supported by Google Adsense. Most of them live off that business and the stories they're telling me cause me pain in my heart. They started it from an emergency situation, often because their unemployment insurance wouldn't pay anymore, no chance for a real job, and [...] being their own boss was the only hope to get some money. They only have limited or almost no clue about the internet or even the communicative processes inside it, they just heard that "you need links for your business to work"[...]

But I only wanted to explain, why I am not sending bills anymore. Voilá, here's the answer: it doesn't help. That means - like I wrote: with professional spammers it doesn't help much. But I am really sorry for the impact on small business makers who don't know better and have problems enough already and shouldn't be bothered by angry bill-sending blogwriters.

I am sorry if my translation was maybe not 100%, but I hope you get the point.

I think you could try fighting professional comment spam by transforming a comment box into a place where you start something like a contract with the person commenting, at least for the local market:

Use Captcha's to prevent comments placed by robots and ensure that a human being wrote the lines

state clearly that you charge for spam comments, but make sure normal people are not irritated by that and are happy to leave their 2 cents on your page

have the user type "I AGREE" in a blank field, otherwise you wouldn't accept the comment

You might also need some help from lawyers.

Be good, don't harm surfers!

image by opticallyactive, used unter cc license.

technorati tags: commentspam, spam, blogging, antispam

Sunday, November 06, 2005

The new age of virtualization

I discovered vmware six years ago, and I knew it could change the way we do technial support and QA entirely. You can easily create as many different machines as you wish, one with Win98, WinME, Win2k, WinXP, all in different releases, languages, patch levels... and for your testing you simply switch on the machine you need.

Even before the release of the GSX/ESX servers I took the workstation edition to set up a pool of different machines hosted on a huge Linux box. Each of them had a vncserver installed, so you could connect to the machine of your choice from your desktop.

I started with VMware workstation 2.0 and already this piece of software was a killer application. Since that time - we reached version 5.5 now - we got many new features and more hardware support.

The biggest change now is the licensing issue. You might have been able to talk your boss into buying you one license for vmware, but I bet it was almost impossible having him supply the entire team with that tool. Now VMware Player was released - and it's free! It is meant for running already configured virtual machines you created with the full product or you downloaded from a supplier like Novell, Redhat or Ubuntu.

It was just a matter of days until the world was told how to use the free tool to legally build and run your individual virtual machine. And indeed, how would you define "preconfigured machine"? Now even a new tool lets you create vm machines from the scratch, where you can configure RAM, disks, hardware support etc. - and upgrades your free player to a full product. Well - almost. The workstation edition offers a lot more stuff, especially the multiple possibilites to handle virtual disks. For example, you can tell vmware to discard all changes made to the virtual disk when powering off.

Why is VMware Player free? These guys must be nuts! Or quite smart :) At the SYSTEMS 2005 in Munich I had the chance to talk to one guy from VMware. They are totally aware what the community is doing with the free player, but on one hand they recently got some competition, and on the other side they hope that reasonable people appreciate the free tool and will be interested in supporting development by buying the full product for creation, and using the free tool company-wide for using the virtual machines.

VMware is pure fun, and it saves your life in tech support and QA.

technorati tags: vmware

Saturday, November 05, 2005

Your internet workplace inside a Unix terminal

We talked about remote desktops before. It's not that easy, using your remote applications on your local machine, mostly you need a complex architecture or installed software.

You might have used PuTTY before, a text mode terminal emulation for Windows you can use to connect to Unix/Linux boxes.

Ever thought about using a terminal like this to get your things done?

Here is a list of interesting tools:

Multi sessions: screen - works like a virtual desktop manager for Windows/KDE, here you open just one terminal, but you can create as many additional screens as you like, you toggle between them with hot keys and so can use many tools at the same time
Read your emails: mutt - email client with many features. You can read your local inbox, imap and pop3 mailboxes, send emails and attachments
Surf the web: links2 - text mode browser
instant messaging: centericq - fantastic text mode client that connects you to your buddies at Yahoo, MSN, ICQ, AIM and Jabber.
rss feed reader: raggle / snownews - two good feed aggregators
downloaders: ctorrent / curl - access to the torrent network and file downloader
blogging engine: nanoblogger - write and maintain your blog inside the shell

If you set up these tools on your shell account, you are able to connect to your workplace from anywhere in the world, no matter if that machine is running Windows, Linux, MacOs, or Unix. And all traffic between your local box and your remote workplace is encrypted.

technorati tags: unix, shell, terminal

Wednesday, November 02, 2005

How to get your remote desktop

In old times, client workstations have been quite dumb. All programs and data was stored on big servers, and the people accessed it from thin clients at their work desks.
With the rise of the desktop pc the client became more intelligent - which lead us to problems in software distribution, driver installations, backups and many more.

Recently, the art of working remote is being rediscovered. Most of work can be done in the browser already, and every day new services come up where we can collaborate on documents, edit and manipulate photos or manage projects. More and more it becomes irrelevant, which operating system your client is running, Firefox is everywhere.

Some years ago I signed up for WorkSpot, a paid service that offers remote linux desktops to which you could connect to with a small java applet in the browser a VNC client. I could access my workplace with word processor, email, messenger, bookmarks and newsfeeds easily from any internet cafe or other place I wanted. And it was great for showing people new things without having to install stuff on their box.

Then, WorkSpot started to have massive problems. Every once in a while, the servers were down, sessions wouldn't start, or they wouldn't install the latest version of gaim so it would work with the redesigned protocols of Yahoo and MSN. WorkSpot seems to be off business now. Their idea was really great.

Now CosmoPod is doing the same thing, the only difference seems to be that they're using NX instead of VNC to connect. Talking about connecting - it didn't work for me. There are too many users out there, or some freaks already smashed it down. Some time ago I thought about coming up with a service like that, but I didn't.

This are the issues with a general remote desktop service

availability: people will probably not use their desktops 100% of time, but when they need it, it must work. Things like that cause heavy loads on the host systems and you need to watch it 24/7.
backups: how to explain to your clients that all data is lost - but at least the service is free?
security: making a webserver secure is quite hard. Making a unix system safe from users bothering each other or sniffing around is almost impossible.
third party software: many people - many different ways of working and software to be used. There will be discussions like "Why is that software installed, but not that one?"

So if you're serious, forget about free remote desktops. Unless a big player like Google offers it - but also here you should think about what you do.

Here is what you can do to get your remote desktop and you don't want to have that box running under your bed:

rent a cheap virtual server (vserver, based on Virtuozzo technology) from your favourite hosting company. In Germany, you can get one with 1 GB harddisk and 5 GB traffic for 3 $ per month.
deactivate all services like http, smtp etc.
install kde, firefox, open office, gaim, .. all the things you need
set up VNC and connect to your individual remote machine
if you want to be on the safer side, start the VNC service only when you need to use your desktop.

Update:
A more secure and also very interesting solution is the following:

get your own virtual Linux box as described before.
Linux is a networking operating system. We tend to forget that when talking about desktops and think of it as a simple Windows replacement. So instead of having your entire desktop remotely - why not simply focus on the applications you REALLY need?
for this you need an X server running on your local machine. The best one for Windows is Cygwin - and it's free.
now we can open gui applications (like Firefox, OpenOffice, Gaim,...) that run on another Linux machine anywhere in the world on our box. Your desktop will be local, just this very application you need is hosted remotely.
for this you start your local X server and allow other hosts to open a window. Issue xhost + on the cygwin shell
open a secure shell to the remote box. You have to enable X11-forwarding, PuTTY can do that
PuTTY will have set a $DISPLAY variable for you on the remote shell if everything worked. Otherwise you might have to switch on X11-forwarding in the SSH-Server on the remote box
now you can simply start a program of your choice, e.g. Firefox in the secure shell and if you listened closely you shouldn't have any trouble.
the cool thing here is, that all traffic between your local desktop and the application is tunneled through the secure shell and so automatically encrypted
try replacing the remote shell from /bin/bash to /bin/firefox. It's fun!

I know these are no quick solutions and you also need to have some basic unix administration skills - or at least the phone number of someone who can do it for you.

Tuesday, November 01, 2005

create newsfeeds for sites that don't have one

crazy about newsfeeds? Well, I am. How could I possibly stay tuned to new stuff on the web without my feed subscriptions. Instead of checking the websites for new articles, I prefer them telling me when there is something interesting.

A lot of websites still don't offer newsfeeds - for different reasons. FeedTier is your friend here, it is still beta, but it analyzes pages you submit and it makes an rss feed from it. Works great if you want to watch e.g. all incoming Jackie Chan DVDs on ebay.

P.S.: I had the identical idea for a project also, but was too busy working on other things. There is also this great misbelief, that we're the only one on the planet who could possibly come up with "this great new idea". What do we learn? You really need to be quick with ideas on the web. Otherwise someone else is bought by Yahoo, Google & Co.