Learning to program: A complete curiculum

I’ve seen many, many posts in forums asking about some good books to read to either get started in computer science, or what next steps to take after having read one intro book on a specific language. Of course none of them state what their goals are, other than they “want to learn to program”. But someone who just wants to write an occasional one off application for fun would take a vastly different route than someone wanting to earn a living as a programmer. This post is for the latter group, those who need in depth knowledge of how to design and develop systems, and are willing to accept that a lot of the process of getting there require a lot of hard work.

I’m not going to recommend specific books for several reasons: 1) Which books are good changes with time, the books I used to cut my teeth on are mostly out of print. 2) There are likely several books which are more than sufficient in each category, so there is no best book, each individual should choose books which suit their needs. 3) There are a lot of resources on line which can augment a book, for example some of the on-line course lectures, and it would be best to stick with the book(s) used in the course. 4) I have not read every book in every category, nor do I even know enough people for our combined readings encompass every book in every category, so it would be impossible to fairly represent every book. 5) I don’t recommend anyone go out and buy all of these books at once and try to read them in one sitting, rather they’ll have to be studied over time, so a better book might have come out while you were working your way through a different one. I would recommend individuals read reviews of whatever books are popular at the time they’re in the market for them.

Required:

The following topics should be considered required reading for everyone. No matter which type of programming you plan to get into, the following topics will provide a foundation to build on. The order of the books is not all that important, though you would obviously want to learn the basics a programming language as a first step. I have tried to put them in order so that it wouldn’t hurt to follow through them in the order listed.

  • Basic language book: This can be something as basic as “Teach yourself x in y hours/days/weeks”. Just don’t expect to learn it as fast as the title says. The language you choose shouldn’t be of too much concern, by the time you’re ‘done’ with this program, you’ll likely have become familiar with many different languages. A very common mistake is for beginners to put too much weight into which is the best beginner language, just pick one and go.
  • A more in depth language book: The basic book probably skipped over a lot of the features of the language. Choose a book which goes over the language in depth. There are many introductory books which do a good job of presenting the important parts of a language, so you might be able to get by with a book which combines #1, and #2.
  • Data Structures and Algorithms: This is where you start to learn how to actually get work done, and get it done efficiently. This book, and the books that follow won’t focus so much on the programming language, but higher level concepts of how to design programs.
  • Programming Languages: There are lots of languages out there, each with their strengths and weaknesses. There are also many common features among different languages. This book will help you decide which language features are available and which one is the best tool for whatever task you’re about to undertake.
  • Computer organization: This book will definitely include assembly language as one of it’s topics, in fact many books titled “Assembly Language” will suit the bill nicely. The purpose of this topic is to get an idea of what’s going on under the hood to help you design more efficient systems, as well as explain why somethings are or aren’t possible.
  • Operating Systems: Odds are you won’t be working directly with a machine, but will rely on an operating system to make working with the different parts of the machine easier. This book will likely assume you’re somewhat familiar with assembly language. Note, this is not a book titled something like “Teach yourself Windows XP in 21 Days”, but is a book which doesn’t focus on a specific operating system. It will concentrate more on the underlying concepts of how operating systems are designed. That’s not to say a book on the operating system you’re using isn’t a useful resource though.
  • Discrete Mathematics: This book will almost definitely not contain any code, in fact, it might not even mention computers. It will abstract out the abilities of the computer into mathematical terms. This will allow you to transcend the specifics of the machine you’re working with to design efficient solutions to problems, while still keeping the solution as something that can actually be implemented on a computer.
  • Networking: This shouldn’t be a “Network Programming in <Language X>”, but a book which focuses on the underlying principles of network communication. The book will likely not contain a single line of code, but it’s OK if it does. I wouldn’t recommend a totally abstract book, rather something that focus on TCP/IP with explanations of why certain things exist (or don’t exist) in the protocols would be OK.

Recommended:

For persons interested in game programming vs. those interested in business applications, the path taken will be different. Since my background is mostly in business applications, I can only recommend topics along those lines. Not all of these topics are specific to business programming per se’, but I don’t necessarily consider them to fall under the “required for everyone” heading. These topics are in no particular order and can be read concurrently with each other. And I would even go so far as to say you can read them concurrently with books in the “required” list.

  • SQL and database design: Almost every business application will involve the use of a database of some sort. It is important to be fluent in the use of databases in order to work efficiently.
  • Specific databases: Oracle and SQL Server, MySQL, etc: There are several different popular implementations of databases available, each with it’s own proprietary extensions to SQL. Unless you’re starting your own business, odds are you won’t get to choose which database implementation you get to work with, so it’s a good idea to be familiar with all of the most popular ones. Many books on a specific implementation also include a good into to SQL and database design, so you can combine this with book #1 on this list.
  • Design Patterns: Many programs share similar ideas, so much so that it’s a good idea to give certain ideas or “patterns” names. This allows you to discuss and explain patterns of programs with other developers at a much higher level. Additionally design patterns serve as examples of good design, and will save you the trouble of inventing them yourself. There is also a concept of “anti-patterns” (also called pitfalls) which are examples of common bad design, which can also be of great help. I would also like to warn you specifically of one anti-pattern, and that’s the overuse of design patterns. I’ve seen many programmers get too excited over design patterns that they want to use them everywhere, so it’s important to know when and where to use them.
  • Human Computer Interaction: Not all business applications involve a user interface, but you’ll likely end up writing many applications that do. Being able to develop a good, intuitive user interface is a very difficult task in most situations, but is one of the most important features required in order to make sure your application is actually useful.
  • Object Oriented Design: This is mostly a required concept, and is generally covered somewhat in even basic books on programming languages which are base on object oriented concepts. But it’s a good idea to study specifics of object oriented design in depth.
  • Computer Security: This is a definite must for any business application. It’s easy to be too naive and think no one will attempt to crack your application, or that they couldn’t possibly figure out your home grown obfuscation technique. Studying security will help you be skeptical about the security of every piece of your application and give you common tools to use to develop strong, robust, reliable systems.

Learn other languages:

Knowing only one programming language is almost like owning only one tool in your garage. I’m not saying you need to be a guru in a dozen different languages, but you should be fluent in at least a few languages, and a guru in one or two with the ability to become a guru in other languages quickly. While every business dreams of having all of their applications written in just one language, it’s something that pretty much never materializes, and certainly not for any successful business since the life of the business will certainly exceed the life of the language their applications are written in. So it’s an important marketable skill to be fluent in many different languages, as well as having the ability to pick up new ones without taking a week long training course. Just as an example, on any given day, I might program in as many as 13 different languages: VB6, VB.Net, C#, Java, Perl, C, C++ PHP, JavaScript, VBScript, Python, PL/SQL, Transact-SQL are all languages I use daily (but I don’t ask for your pity). I recommend knowing languages which encompass all of the following features (most languages will have several of these properties, but the idea is to learn several languages):

  • Object oriented language
  • Scripting language
  • Language with memory management built in.
  • Language without automatic memory management (C would be my recommendation)
  • Machine language
  • Static typing
  • Weak typing

Continued Learning

Once you’ve built a foundation, you can’t stop learning there. Computer programming is a field which is constantly changing and will require continued effort towards learning new and different techniques. So, if you don’t like learning new things, this probably isn’t the field for you. There’s a good chance that most of the techniques you learn today won’t be of much use 10 years from now, and whatever you’ll be doing 20 years from now will be barely recognizable as programming to a programmer today. Many of the underlying concepts change much more slowly than the actual technology, which is why it’s important to learn them. When you understand the concepts, it’s much easier to find a tool to make it happen.

Passing immutable types by reference in Java and C#

To many, it should be obvious what the following code prints:

public static void main(String[] args){
     int x=0;
     SomeMethod(x);
     System.out.println(x);
}

protected static void SomeMethod(int x){
     x=1;
}

The code prints 0, because “int” is a native type and is passed by value. But, what if we replace “int” with “Integer”?

public static void main(String[] args){
     Integer x=0;
     SomeMethod(x);
     System.out.println(x);
}

protected static void SomeMethod(Integer x){
     x=1;
}

This is where a lot of people get confused. Integer is an object type, and will be passed by reference, but the program still prints 0. Why? Even though x is a reference to an object, Integer is an immutable type, and the assignment “x=1” is actually equivalent to “x=new Integer(1)”. This actually changes the value of x, rather than the value pointed to by x.

Fortunately this case is rarely the source of errors since it’s considered bad design to return values through parameters.

Although the above code is in Java, you will run into similar behavior with C# immutable types.

Does string interning in C# guarentee you only get one copy of a static value?

Of course the answer is no, otherwise it wouldn’t be a very interesting post. The concept of interning is an attempt to save memory by allocating static values which match exactly to the same memory location, without regard to where they’re used in the application. But here’s in instance where it doesn’t quite work.

The String.Empty constant just contains the value “”. So logic would say that every value in the following code would point to the exact same location:

        static void Main(string[] args) {
            String t = "";
            String t1 = String.Empty;
            String t2 = String.Empty;
            String t3 = "";
            Console.WriteLine(t + t1 + t2 + t3);
        }

But, if you look at the disassembly, you find that the two values for “” were interned to one location (0226303Ch), and the String.Empty values were interned to another (0226102Ch):

String t = "";
00000033  mov         eax,dword ptr ds:[0226303Ch] 
00000039  mov         esi,eax 
String t1 = String.Empty;
0000003b  mov         eax,dword ptr ds:[0226102Ch] 
00000040  mov         edi,eax 
String t2 = String.Empty;
00000042  mov         eax,dword ptr ds:[0226102Ch] 
00000047  mov         dword ptr [ebp-48h],eax 
String t3 = "";
0000004a  mov         eax,dword ptr ds:[0226303Ch] 
00000050  mov         dword ptr [ebp-4Ch],eax 

Protection against SQL injection attacks in PHP.

Early on PHP had no good methods for escaping SQL, and until recently didn’t support parameterized queries. As a result a lot of documentation covers SQL queries without really addressing the issue, and a lot of older PHP developers are unaware of the enhancements made to prevent this type of attack.

PHP 4.3 introduced mysql_real_escape_string which escapes all potentially “bad” characters which could cause unwanted results in your queries. The link contains examples of how to use the function.

PHP5 includes the MySQLi (MySQL Improved) extention which provides a more enhanced API for accessing MySQL. The mysqli_stmt_bind_param function allows you to use parameterized queries. The link to the function provides an example of how to use parameterized queries.

Parameterized queries are generally considered safer than escaped strings, but that’s only in theory. mysql_real_escape_string currently escapes all known bad characters, and the only thing that would make it unsafe would be for another bad character to be discovered. But parameterized queries will go through a more complicated code path, and thus, more likely to be affected by a coding bug. So there’s really no security related argument which favors one over the other.

Some great advanced JavaScript videos

Douglas Crockford of Yahoo has made some excellent JavaScript lecture videos. He covers how to work around the issues in JavaScript to make it more scalable and easier to work with. I’ve yet to even see a book or anything else which could be purchased for money which delves into the details of the language at a depth of what Crockford does.

Although they are videos, I found it easy to follow by just letting it play in the background while working on something else. Of course there are many times you must switch over to the video to see the code he’s referring to, so it wouldn’t work well on just an MP3 player.

Douglas Crockford: “The JavaScript Programming Language” 1 of 4

Douglas Crockford: “The JavaScript Programming Language” 2 of 4

Douglas Crockford: “The JavaScript Programming Language” 3 of 4

Douglas Crockford: “The JavaScript Programming Language” 4 of 4

Douglas Crockford: “Advanced JavaScript” (1 of 3)

Douglas Crockford: “Advanced JavaScript” (2 of 3)

Douglas Crockford: “Advanced JavaScript” (3 of 3)

Finding “dead time” in a database of start and end times.

The following snippet will find “dead time” (e.g. time where no events are scheduled) in a database:

    1 select distinct dateadd(s,-1,starttime) as deadtime,"start" from sometable t where
    2  0=(select count(*) from sometable u where u.starttime < t.deadtime and u.endtime > t.deadtime)
    3 union all
    4 select distinct dateadd(s,1,endtime) as deadtime,"end" from sometable t where
    5  0=(select count(*) from sometable u where u.starttime < t.deadtime and u.endtime > t.deadtime)
    6 order by deadtime

Updating Linux systems for 2007 Daylight Savings Time changes.

This works on most systems. First download the updated DST packages:
wget 'ftp://elsie.nci.nih.gov/pub/tz*.tar.gz'

Extract and compile the utilities:

tar -zxvvf tzco*
tar -zxvvf tzda*
make

Now compile the data:

mkdir temp
./zic -d temp northamerica

Verify the new data file is correct:

./zdump -v temp/EST5EDT | grep 2007
/etc/localtime Sun Mar 11 06:59:59 2007 UTC = Sun Mar 11 01:59:59 2007 EST isdst=0
/etc/localtime Sun Mar 11 07:00:00 2007 UTC = Sun Mar 11 03:00:00 2007 EDT isdst=1
/etc/localtime Sun Nov 4 05:59:59 2007 UTC = Sun Nov 4 01:59:59 2007 EDT isdst=1
/etc/localtime Sun Nov 4 06:00:00 2007 UTC = Sun Nov 4 01:00:00 2007 EST isdst=0

Verify your old settings really are incorrect:

./zdump -v /etc/localtime | grep 2007
/etc/localtime.old Sun Apr 1 06:59:59 2007 UTC = Sun Apr 1 01:59:59 2007 EST isdst=0
/etc/localtime.old Sun Apr 1 07:00:00 2007 UTC = Sun Apr 1 03:00:00 2007 EDT isdst=1
/etc/localtime.old Sun Oct 28 05:59:59 2007 UTC = Sun Oct 28 01:59:59 2007 EDT isdst=1
/etc/localtime.old Sun Oct 28 06:00:00 2007 UTC = Sun Oct 28 01:00:00 2007 EST isdst=0

Now install the new file:

cd /etc
mv localtime localtime.old
cp ~/asdf/temp/EST5EDT localtime
#Verify correct installation:
./zdump -v /etc/localtime | grep 2007
/etc/localtime Sun Mar 11 06:59:59 2007 UTC = Sun Mar 11 01:59:59 2007 EST isdst=0
/etc/localtime Sun Mar 11 07:00:00 2007 UTC = Sun Mar 11 03:00:00 2007 EDT isdst=1
/etc/localtime Sun Nov 4 05:59:59 2007 UTC = Sun Nov 4 01:59:59 2007 EDT isdst=1
/etc/localtime Sun Nov 4 06:00:00 2007 UTC = Sun Nov 4 01:00:00 2007 EST isdst=0

Note, for some reason zdump will give incorrect output if you are in the same directory as the file you’re attempting to dump. For example “zdump -v localtime | grep 2007” won’t return anything. I also ran into some cases where specifying the path to the file as temp/EST5EDT didn’t work either, so try to always specify the full path to the file: “zdump -v /etc/loclaltime …”.