Tools and tips for completing a postgraduate degree in Computer Science

I started this article about a year ago, just after I finished my Phd. I had mapped it out then left it to gestate but never got round to finishing it.

I recently completed my Phd. The skills that I needed to compete my Phd were much different to those I had when I finished my undergraduate degree. The tools that helped me along the way were not the ones I learned about as an undergraduate.

This document is an attempt to describe some of the choices that I made that I think helped me get through it more quickly and efficiently. The overriding theme is that of choosing the right tools for the job. In that regard the advice here is probably specific to computer science students although some of it may extend to other disciplines. Choosing the best tool will make your life much easier, and lets face it, you’ll need all the help you can get.

Always look for the best tool for the job at hand.

It’s important to remember that your aim is to to complete a Phd in Computer Science. Anything that distracts from that should be carefully evaluated.

Completing a Phd is about coming up with ideas, testing them out, seeing if they work and telling the world about the good ones. It’s the ideas that are important and disseminating good ideas. While the execution of an idea is important it is easy to get bogged down in this phase. This is where your choice of tools can help. Many people just stick to the tools that they are familiar with rather than searching for the best tool for the job. This article describes some of the tools that I found useful.

A lot of your time will be spent either doing development work or writing. If you take the time initially to optimize your writing and development environments it will save you time over the course of your thesis.

Optimizing your development environment

Operating System

Run unix: Mac, Linux or FreeBSD. Unix has a huge number of high-quality, free development and programming tools. It’s a tremendously powerful scripting and development environment. If you have a mac you can combine all the power of unix with a user friendly desktop and the ability to run commercial applications if you need them.

Programming Language

If you’re doing computer science research you are likely to be doing quite a bit of programming to test out your ideas. The choice of programming language can have a huge impact on your productivity. Usually in research there are far fewer limitations on your choice of programming language than there would be in industry. You don’t have to use java and you don’t have to use whatever language you were taught as an undergraduate! You are not limited by company policy or the need for your system to be super-fast or scale to thousands of users. You just need to find out if your idea works.

Choose a programming language that suits your problem and will allow you to be optimally productive in solving the problem that you want to solve. It’s well worth taking a couple of weeks to study a new programming language if it saves you months over the course of your thesis.

The important thing is to be able to quickly test and prototype new ideas. I know someone who spent a year writing a text analysis system in c++ only to discover that their new algorithm idea didn’t really work. They could have reached the same conclusion in a month if they’d written it in python.

As a researcher you want to minimize your development time. It’s important to be able to quickly prototype and test new ideas. To this end I’d recommend learning either ruby or python. These are good general purpose programming languages that are particularly good for rapid prototyping. Both languages have lots of good libraries available and have much shorter development times than statically-typed languages like java. Learning one of these should enable you to be a lot more productive than using something like java or c++. Both python and ruby are very similar - choosing one can come down to syntactic preferences or one having better available libraries for your problem domain.

It may be the case that other programming languages are even more appropriate for your problem domain. For example, if you need to write a network server, erlang might be the best choice. If you’re developing a web application you might want to use rails, django, turbogears or seaside.

The following are all good introductions and can be read online:

Build Tools

Use a build tool to automate repetitive tasks. If you do something more than twice, write a script to automate the process. Some options are Make, Ant and Rake. I recommend learning rake. It’s a build language written in ruby. Capistrano allows you to combine and run rake tasks on remote servers.

For good overview of rake see Using the Rake Build Language.

For capistrano see Capistrano: Automating Application Deployment.

Get to know your text editor

Whatever editor you use it is worth spending some time learning how to use it fully. Try forcing yourself to use your editor without using the mouse. If this slows you down then you don’t know your editor very well. Learn the keyboard shortcuts and how to automate repetitive editing tasks. The more comfortable and efficient you are with your editor the more productive you will be.

Version Control

The two major free version control systems are CVS and subversion(SVN). SVN is newer and fixes some of the problems of CVS. Using version control allows to keep a complete history of all the code you develop and easily access previous versions of your code. If you make major changes to your code to test out a new idea and find it doesn’t work, it’s easy to rollback to the previous version. Using version control also ensures that your code is backed up to an external server (providing your svn server is on a different machine).

The subversion book: Version Control with Subversion is available online.

Libraries

Check what’s already available. This may influence your language choice. For example, if your project involves machine learning do you want to spend months writing implementations of existing machine learning algorithms or do you reuse and adapt the implementations already available in open source toolkits such as Weka or Mallet. If you decide that you will be using Weka, then you need to be able to call its java api. Using Java as your implementation language would then be an obvious choice. However you could use Jython or jruby to prototype your system and then if necessary reimplement parts of it is java at a later date. Jython is a java implementation of python and jruby is a java implementation of ruby. These allow you to write you code in ruby or python but have full access to the java classes in whatever java library you are using.

Optimizing your writing environment

Word Processing

Latex is king of the academic typesetting world. It is a language for describing documents and has lots of features for automating the process of typesetting a document. It produces documents that are really nice to look at. Latex is a great way to write your thesis and academic papers. The main problem with it is that there is a bit of a learning curve. You can’t just fire it up and start producing documents. It’s a bit like learning a domain-specific language for typesetting documents and many people don’t want to put in that effort. However for something on the scale of an academic research thesis it is worth putting in the effort to learn latex. Once your are familiar with it you will find that there are a plethora of features that make your life easier. You will get much better results that you would using a word processing package. You will also find that most academic conferences will provide style files for their particular publication format.

While I would recommend that you familiarize yourself with latex so that you know the basics I don’t think that it’s the most productive way to write a thesis. Lyx is a WYSIWYM document editor built on top of latex. You can get all the power of latex without having to learn all the finer points of latex. It has all the features of latex, and if you really need to you can insert some latex directly.

When writing, it’s the content that is important - you shouldn’t have to spend time working on the formating of your document. Choose tools that let you concentrate on the content. Don’t worry about the format. This is often referred to as ‘what you see is what you mean’ whereas packages like Microsoft Word take a ‘what you see is what you get’ approach. They mix the editing of content and the formating of that content together. Lyx is a different way of editing documents to other word processing packages. You don’t edit the format of your document in Lyx. You don’t worry about changing the font-size, inserting extra spaces, justifying paragraph etc. This stuff is all defined externally in a style class. So you just mark text according to what type it is - title, heading, cross-reference, code etc., and then lyx takes care of styling it according to the style definitions. So you concentrate on creating content and you forget about all that irritating styling stuff. It is worthwhile spending some time going through the lyx tutorial and learning how to use it - it will save you a huge amount of time over the course of writing your thesis. If you take only one tip from this article, I would suggest that you use lyx to write your thesis. I used it to write my Msc and Phd theses and most of my papers. For an overview of lyx see the following:

I prefer using Lyx to latex as it gives you all the power of latex with a flatter learning curve. Since lyx is a nice frontend to latex some latex knowledge is desirable in case you want to do something that lyx doesn’t support. Lyx removes the tedium of remembering and adding all the various latex commands.

Other options include word processors such as Abiword, OpenOffice, Microsoft Word. You could use these for writing your thesis, but I wouldn’t recommend it. These word processors don’t scale nearly as well to large documents as something like lyx or latex.

Bibliography

If your using Latex or Lyx to write your thesis, then you are using Bibtex to manage and generate your references. Bibdesk is a GUI for Mac for Bibtex files.

Diagrams

You could try drawing diagrams using a paint program or powerpoint but you will save a lot of time and produce much nicer results if you use a program that is designed specifically for drawing diagrams.

OmniGraffle is an excellent diagramming application for Mac. You may have gotten it free with your mac (if its a higher end mac). If not, its worth paying for if you’re going to be drawing a lot of diagrams.

Dia is the best free alternative and was my diagramming tool of choice in my pre-mac days. Xfig can produce impressive results, but its interface is not very intuitive and is is not designed specifically for diagramming. Openoffice Draw can also be used for diagramming.

Graphs

Gnuplot is excellent for producing graphs. It has a lot of different commands to cover any graphing task you can think of and can easily be scripted so you can automate the creation of graphs. It is command driven so it takes a bit longer to learn to use it than some of the gui-driven graphing packages. Grace has an awkward interface but produces decent graphs. Microsoft Excel and OpenOffice Calc can be used to for graphing.

More general advice on writing

Optimizing your writing environment will make it easier to get through the writing phase of your thesis. But you still have to motivate yourself to do a large amount of writing. A couple of extra tidbits I would offer are

  • Keep track of how you are doing. I stuck a page on the wall beside my desk updated it every day with my page count. Thus you can’t hide from it if you take it easy for a week.

  • Keep track of what you want to do. Start each day with a list of what you want to get done that day.

  • Write something every day. Get in the habit of writing early and you will find it easier when you get to the final write-up. Keeping notes as you go along will help a lot when you get to final task of writing the thesis document.

  • Get rid of distractions. When you reach the stage of full writing mode eliminate distractions from your environment. Plug out your network connection and don’t allow yourself to be distracted when its not going well.

  • Sometimes you just can’t write. If this is the case take a break for a while - many of the best ideas come when you’re away from it.

  • When you can write (you’re in the zone), write as much as you can. Don’t worry about spelling and word choice - you can edit it later.

  • Don’t be squeamish about editing. If something isn’t right, cut it out.

Link to advice for completing a Phd

This article was mostly about tools that can help you along the way to competing your Phd thesis. There are lots of pages with more general advice on getting through it.

And of course you should read (or re-read) The Elements of style before you start your writeup.

Summary

  • Use the right tool for the job.

  • Get on overview of several programming languages. c, Perl, Python, Ruby, Erlang, Lisp. Use whichever language is most suited to the task at hand.

  • Organize your development environment: use SVN, rake, learn an editor.

  • Use Lyx and/or Latex for writing your thesis.