Wednesday, March 14, 2007

Who Needs a Database? How I Built a Fully Functioning Website Without One

Why?


Why would you want to build a website without a database? I'll try to answer this question for you by explaining some of the reasons why I did it.



  • One of the reasons was purely academic - could I build a site that persisted data and was fairly interactive but with no database? If I did, what kind of problems would I run into, would the implementation be stable? Could I build it in such a way that if I ever needed to use a database - woud it be hard to convert? In short, it was a challenge I couldn't resist.


  • Another reason was simply because there was no reason not to do it! Once I had settled on the site design, there was just no solid, compelling reason other than convention why I would have to use a database as my persistence medium. None whatsoever. Many of you will disagree with me of course, and I welcome your comments and rants. :)



With no reason not to use a database, and with the challenge in hand, I got down to the business of building the site. Before I explain how, I'd like to give you a quick overview of what the site is and what it does:



The Site


The site was built using ASP.NET 2.0
To reference the fully functioning site throughout this tutorial, go here.

The site design is very simple. Every day ten Sudoku puzzles are displayed on the main page, which I call The Scoreboard:



Clicking on a Scoreboard widget (shot below) will take you to a page where you can solve the puzzle. If you solve it quickly enough, you get to enter your name and url and take your spot on the scoreboard. If someone comes along and solves it more quickly than you did - they bump you down the list. Only the top 5 players are shown for each puzzle.




Nice and simple.


The Ingredients



The Solutions File
The first hurdle was to decide how I would generate the 10 daily puzzles. I didn't like the idea of having a scheduled process generate 10 new puzzles at midnight every day so I went with generating several months worth of puzzles and placed 10 puzzles into their own "solution" text file. Each file is placed into a folder labeled with the date it is to be used for. Essentially, when the scoreboard needs to get the list of puzzles for the day, it nagivates to the directory labeled with the current date and retrieves the puzzle information from the file inside the folder.

The solution file is simply a delimited file with the first value being a GUID (to uniquely identify the puzzle) and then a comma delimited list of values that represent the Sudoku puzzle.

Here's what the solution file looks like:



The Puzzle List

Since this file contains the list of puzzles for the day and once read would never change, it made it a great candidate for caching. And that's what I did with it. The Scoreboard never directly interacts with the solution file, instead it interacts with a Class (called Sudoku) that exposes a Shared method. The method returns a Hashtable called PuzzleList. The key is the puzzle Guid and the value is the comma separated puzzle values.

The benefit of this approach is that the Shared method can determine if the puzzle list needs to be reloaded (say, it's a new day). If not, it the hashtable is automatically returned.




Public Class Sudoku

Shared _puzzleList As Hashtable

Public Shared Function PuzzleList() As Hashtable

If _puzzleList Is Nothing Or Now.Date > HttpContext.Current.Application("lastSudokuCachedDate") Then '

Try

_puzzleList = New Hashtable
'populate puzzle list.

Dim puzzles() As String
Dim puzzleContents() As String
Dim i As Integer
Dim fileContents As String

Dim server As HttpServerUtility = HttpContext.Current.Server



fileContents = My.Computer.FileSystem.ReadAllText(Now.Date.ToString("MM-dd-yyyy") & "/puzzles.sln"))

puzzles = Split(fileContents, vbCrLf)


For i = 0 To puzzles.Length - 1
puzzleContents = Split(puzzles(i), ":")
'key is guid
'value is comma separated values represeting puzzle.
_puzzleList.Add(puzzleContents(0), puzzleContents(1))
Next

HttpContext.Current.Application("lastSudokuCachedDate") = Now.Date

Catch ex As Exception



End Try

End If


Return _puzzleList


End Function

End Class



The Scoreboard

When scoreboard.aspx is requested, it simply makes a call to Sudoku.PuzzleList() and iterates over the collection. A scoreboard widget is added to the Page's control collection and assigned a Puzzle GUID. That's it. All the scoreboard knows about is showing those scoreboard widgets (and Google Adsense of course!)

The Puzzle File

When a solution file is generated, an scores file is generated for each puzzle. This file is responsible for storing the information that you see on the scoreboard. It's a delimited file as well, storing the time, the name and the url of the person who solved the puzzle.



The Scoreboard Widget
This is perhaps the most important of all the ingredients. The scoreboard widget is a u ser control that takes one parameter - the id of the puzzle. When the widget loads it does two things - it looks in the cache to see if there is a score listing for the puzzle and if there isn't, grabs the file, parses it and sticks it in there.

Essentially, the scoreboard widget only ever displays the data from the cache. This is VERY fast especially when the alternative is to read it from disk every time.

Well, what happens when someone makes it on the scoreboard and the cache is different from what's in the .scores file? In that case, the scores for the puzzle is removed from the cache. So the next time someone goes to the scoreboard and the widget looks in the cache, it's not going to find anything and will grab it from the disk then cache it.

In other words, the first time through, the file is cached forver, until someone beats someone out of their spot on the scoreboard at which point, the file is updated, and the cache is invalidated. The cycle starts all over again.




Protected Sub Page_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load

Dim fileContents As String

If (IsNothing(Cache(PuzzleID))) Then

fileContents = My.Computer.FileSystem.ReadAllText(PuzzleID & ".scores"))

Cache.Add(PuzzleID, fileContents, Nothing, DateTime.Now.AddHours(1), TimeSpan.Zero, CacheItemPriority.Normal, onRemoved)

End If

'get the data out of the cache.
fileContents = Cache.Item(PuzzleID)

Dim scores() As String
scores = Split(fileContents, vbCrLf)

Dim i As Integer

For i = 0 To scores.Length - 1

'parse the contents...blah blah blah
'snipped for brevity

Next

End Sub




The Puzzle Page
I thought I would throw in a description of the puzzle page in for good measure.

The puzzle page take a QueryString parameter ("id"). When the player is ready to play the game, he or she clicks the START button. The START button makes an AJAX call to another page which returns an XHTML representation of the puzzle. It uses the values in the solution file and randomly removes a set number of clues to the puzzle. When the user is done he or she clicks the SUBMIT button and the time is compared to the score values stored in the cache. If the player has beaten any of the times, the file is rewritten to disk with the new players information and the scores data removed from the cache.

And...that's it!


Conclusion



Is it feasible to build a site without a database? You bet! As I hope I demonstrated above, it all depends on is what you're trying to accomplish. The site I built didn't need a database. I just didn't need the overhead or the added complexity.

I have to admit that it was a refreshing excercise for me since I tend to do things just because it's how "they're done". It was great to take a step back, decide on a different approach and implement it. It's also incredibly gratifying to see it work. The site feels snappy because of caching and is very easy to maintain because of the simple file structure. And did I mention debugging? A cinch! I don't have to worry about queries, stored procedures, database connections, password, etc.

I hope this tutorial encourages you to try something out of the norm, or at the very least makes you think about ways in which you can simplify your designs.

As always, thanks for reading!

Update 8:37 A.M. 3-16-07

Reddit user bluGill left a comment:
"...I see several databases: First the the filesystem, which is itself a database. Second is his scorecard, which is a database..."

I can't disagree with this - but I think it's a bit picky. I assumed that it was very obvious that when I refer to a "database" I'm referring to a relational database such as Oracle, SQL Server, MySQL etc.

Update 10:14 A.M. 3-20-07

See my follow up post which is a response to anyone who thinks this is an incredibly bad idea. Please read it and feel free to comment!

Update 1:13 P.M. 3-23-07

I wrote another post about this topic called "Coding By Dogma"

28 comments:

Anonymous said...

you can get away with files to persist data only if you dealing with very simply. as soon as you have a model that needs a simple many to one relation you'll have to use a database or otherwise the complexity of using a file will surpass that of using a database.

plus with multiple threads trying to update the file you'll have to deal with file locking.

oh.. and also no transactions....i can go forever. just use an Access database for god sake (btw, what are the recommendations for a file based database?

John said...

I agree! but as I outlined above - in the implementation I described above - I don't need transactions and I don't need a one to many relationship. So, if I don't need those things - why use a database?

Anonymous said...

Maybe you found the FileHelpers useful for this =)

I lot of people use it to save simple info in ASP.NET applications.

http://www.filehelpers.com

Cheers

John said...

Very interesting Marcos - thanks for the link. After writing this blog post I wondered if it wouldn't be a good idea to wrap this up into a library of sorts. I'm thinking of a library that would handle the caching, etc.

Again, thanks! It's very interesting!

Anonymous said...

I´m very glad that you like it =)

Let me know if you have feedback or idea =)

We are working had in the 2.0 version, we reach a 65% enhace over the past version.

Thanks for the post and comments.

Cheers

Unknown said...

Hate to burst your bubble about "not having a database", but your text files are just another (low-powered) version of a database.

A hypothetical scenario - you want to add the ability to track the ranking of a user across all of the puzzles they have solved, or find the average time per user per puzzle. Or, or, or. All questions that can be easily answered in SQL, but which will be tricky with text files.

The only two cases this seems to make sense in are:
(1) a strictly resource-limited site with many, many users
(2) you get paid by the line for code.

Case (1) doesn't make any sense, as AdSense will provide resources if you provide users.

Case (2), on the other hand, is likely the reason that code like this is found in the wild...

--mdj (who has written code similar to above, but
a decade ago, with paranoid sysadmins [no DB server] and mixed endianness (no Berkeley DB))

John said...

Hi Matt, thanks for your comments -I appreciate them.

I agree about text files and how I use them being a form of database. Indeed, they are. They aren't the full blown RDBM systems that I'm referring to when I talk about "databases".

One of the reasons why this implementation works for me and why I'll continue to use it is simply because user's don't have an account on the site. If they did, you bet your ass I'll start using a database!

As you rightfully said, if you want to do things like show the average time per user, using text files would be awkward and just plain stupid at best!

Since I'm NOT doing that, the text files work just fine.

Thanks for everyone who has commented - I really do appreciate it!

Unknown said...

What this mostly seems to do is unsolve certain problems. There are good reasons to use an RDBMS, as they solve problems that you do not need (or want) to handle as a programmer. Some examples:

1. File parsing: Your solution forces you to parse these files yourself. You've snipped most of that except for the Split(fileContents, vbCrLf), but it results in lots of extra code.

2. Caching in memory: Databases handle this, take a look at MySQL's query caching for example. Alternately you could use something like memcached (google it)

3. Race conditions: The web is a multithreaded environment, have you protected your scores files from concurrent updates? (Simple case: players A and B get new highscore at the same time, the update goes: A reads, B reads, A writes, B writes. Now A's score is lost).

4. Disk efficiency: Assuming a standard Windows NT/2k/XP/2003 server setup, cluster size is 4k. Your sudoku puzzle files seem to use 620 bytes, resulting in 85% wasted disk space per file.

At the end you even suggest putting it all in a library, which would effectively mean you've just started on your own little database project :-)

None of the above seem to be a major problem for your website, and I think writing the above is a good mental exercise, but the conclusion that it's a good idea to go without a database for simple dynamic sites is simply wrong. Databases have been developed for a reason, use them! ;-)

John said...

Thanks for your comments Herman. :)

You're right of course. I would not recommend including this this technique in your standard bag of tricks. It was an interesting mental excercise and one I took up because it seemed like a fun challenge.

The code I snipped was pretty trivial btw, a coupld of loops over the array of split values. Nothing too fancy. :)

Anonymous said...

I can understand this as purely a programming challenge but that's it.

You're justification for using it in an actual system is based upon no changes to the requirements at all. Ever.

It's not hard to imagine what other requirements may be required later and as soon as that happens you're stuck. Can you easily get the maximum score per user between two dates? Can you find the average score for users who's name starts with a 'W' and who have played 5 games between midnight and 3 am?

All reasonable queries that are easily answered using SQL but that would require tight coulping and nastiness in this solution.

Not to mention the lack of type safty, referential integrity checks, FK constraints, transactions, load balancing, query caching, index tuning over time, backup systems, running on redundant servers, triggers etc.

There really is no justification for using this in an actual system. Software should be SOFT and using this makes the software rigid, highly coupled, inflexible and too much work for it's own good.

I'm sure you brushed up on some programming skills and learnt a few new things about the languages you used along the way. And that's great for you. But that's it.

Anonymous said...

But... Wait a sec. Isn't this much more complicated than using a database ?

John said...

Not at all! It was very easy to put together.

Anonymous said...

For such application, I would go with db4o. Much easier than text file manipulation. (My db4o tutorial can be found at http://www.codeproject.com/useritems/OOP_with_db4o.asp

Anonymous said...

Interesting! :-)

Anonymous said...

WooW =)

Anonymous said...

Interesting!

Anonymous said...

Interesting article, thanks!

Anonymous said...

Great to see this! Thanks!

Anonymous said...

Thanks for interesting article.

Anonymous said...

I like articles like this. Thanks!

Anonymous said...

Write something else. Thanks! Best Blog...

Anonymous said...

Well done. Keep up the great work. Best regards!

Anonymous said...

I like it a lot! Nice site, I will bookmark!

Anonymous said...

Thanks to author! I like articles like this, very interesting.

Buy Viagra said...

Great article and very interesting blog. That's one thing I'm really looking forward. Looking forward to reading more from you.

Viagra Online said...

Hello, what can I say, your blog make me giggle, its so funny. So really nice post.

buy accutane said...

Florida jury awards $7 million in Accutane case After two days of deliberating, buy accutane the jury on Thursday found Accutane maker Hoffman-La Roche Inc.. Remember, keep this and all other medicines

sex life said...

Quite useful info, thank you for the article.