(I'm drafting this at the moment. Feedback appreciated - tom@tommorris.org or on Twitter )
Thank you Hacker News, Reddit and Delicious.
A lot of guides to Scala are written with an intended audience of academics, computer scientists and the sort of people who really enjoy reading about programming language theory and spend their time reading Lambda the Ultimate etc., partly because Scala's development by Martin Odersky at EPFL means it's quite an interesting language on a purely intellectual level. (I don't mean to slight those people - I enjoy reading PLT stuff too.)
But Scala isn't simply a research project: it's designed as a practical language that you can use in the same way you can use a more mainstream language like Java or Ruby. It is already being adopted by industry with both Twitter and Foursquare using it in the Web 2.0 space and some more blue chip companies like Siemens, EDF Trading and Sony have also started using Scala for some projects. Scala fits a rather unique niche: it is supposed to be useful both for programmers who usually work in dynamic or 'scripting' languages (however one defines them) like Python, Ruby or Perl and for people more familiar with languages like Java, C# and C++. It really has the benefits of both. As a language, it has a strong type system, but you don't have to make explicit type declarations - instead, the compiler or interpreter will be happy to work out that in `var foo = 9` a different type is being instantiated than in `var foo = "9"`. The language also imports many of the constructs of functional programming languages like Haskell into a language that's strongly OO.
But is it a language for hackers? By hackers, I don't mean kernel hackers. I mean people who build small hacks - who use whatever technology is at hand to achieve some interesting end-goal. Hackers need to be able to do the really easy things quickly. There, familiarity and speed beat correctness - worse is better because better can't be done quickly enough. BarCamps and HackDays require one to be able to turn around an idea overnight: a valuable skill in the commercial world also. Being able to rapidly prototype is a useful skill: it means you can "throw one away" quicker, you can drop ideas that are impractical and move on, and it means you can actually show something the next day and make people say 'Wow'. Some of the hacks that people build when dosed up on coffee end up going on to being big projects. Can Scala be there for the lone hacker who wants to build something fast? Odersky certainly thinks so: it's called Scala because it is claimed to be a "scalable language" - as good for building large concurrent systems as small hacks and scripts.
I think that Scala could be a perfect language for a lot of people who do lots of small-scale hacks at things like HackDays. I'm not totally sure of it, and a lot of people I speak to are sceptical. Unfortunately, most of the current Scala introductions spend their time talking about the intricacies of functional programming, and show you how to construct quicksorts, combinatorial parsing routines and other interesting but theory-driven tasks. Far more useful to show how one would talk to parse an RSS feed, get data from Twitter, build a web service and so on.
Some Scala users simply respond: "ah, but you just use the underlying Java". Which - for the PHP, Python, Ruby and Perl crowd who try hard to avoid Java - the answer is "okay, so what's that?" Scala for Hackers is intended to solve this by pointing to the least-disempowering way of doing common practical tasks in Scala. Not necessarily the most powerful. Sometimes I will favour Java libraries rather than native Scala libraries because the Scala libraries seem to be designed for people deeply into the theory. To put it a little strongly, rather a Java library you can understand in ten minutes than a Scala library that is still puzzling after reading the source code repeatedly...
As I said, I'm not sure of whether Scala will be a good language for programming in the small. Writing this is a way for me to try and document the various things I'm doing with Scala. Hopefully the Scala community might be able to use it to make simpler libraries for the many practically-minded end users exploring this exciting language.
Theory
Obviously, you need to learn the basics of Scala before hacking with it. If you learn from dead-tree media, I suggest Odersky, Spoon and Venner's "Programming in Scala" (Artima; Amazon.com Amazon.co.uk) or Wampler and Payne's "Programming Scala" (O'Reilly; Amazon.com Amazon.co.uk). Odersky et al. is slightly more theoretical while Wampler and Payne is a bit more practical.
Some other books that are available on Scala include David Pollak's "Beginning Scala" (Apress; Amazon.com Amazon.co.uk) - this is a much gentler introduction than either Odersky et al. or Wampler and Payne, and is written by the creator of the Lift web framework - and Venkat Subramaniam's "Programming Scala: Tackle Multi-Core Complexity on the Java Virtual Machine" (Pragmatic Programmers; Amazon.com Amazon.co.uk) which is a relatively short and readable introduction to Scala with emphasis on concurrency.
Manning have a book coming soon called Scala in Action by Nilanjan Raychaudhuri.
As with Python, Ruby and a lot of other languages these days, Scala has a REPL - a Read-Eval-Print Loop. REPLs originate in the Lisp/Scheme community and allow you to interactively enter commands and see the response returned immediately. Some trumpet the presence of REPLs as a benefit of dynamic type systems, and consign static type systems to the compile-execute-debug cycle that is traditional of languages like C and Java. Scala - along with C# - has a REPL, and it'll really boost your productivity to be able to interactively test and interrogate objects and functions. You can get into it just by running 'scala' from the shell and then typing lines of Scala into the REPL. The REPL will be one important tool in learning Scala - so use it.
Variables
When declaring variables in Scala, you can either declare them as var or val. vars are reassignable, vals aren't.
What does that mean in practice?
val foo = 5
foo = 6
<console>:2: error: reassignment to val
You can reinitialize the variable though:
val foo = 5
val foo = 6
If you are coming from Java, val is basically final. Similarly, C# has sealed variables.
The easiest way to think about it is that var is a variable like you may be used to from PHP or Python or Ruby or wherever. val is slightly different - it's a name you are giving as a shortcut for a value. It's not really a variable. But that's okay, because you often don't actually need your variables to be variable nearly as much as you think you do!
If you are new to the sort of style of programming Scala lets you do, the difference between var and val may seem very strange. You may have to trust me when I say it does have a purpose, but the importance of the var/val distinction isn't immediately relevant to you getting going with Scala. The books I mentioned above - as well as many other posts - expound on the relevance of the difference, and the benefits you can get if you use val instead of var.
Companion Objects
If you have used a good object-oriented language like Java or Ruby, you probably know the difference between an instance method and a class method - in Ruby, you define the latter by using 'def self.whatever' while the former uses just 'def whatever'. The class method is simply a method that can be used without an instantiating object. It is useful as you can use it for alternative constructors and the like (once use in Ruby on Rails is finders: 'find' is a method on the Post class, not on an individual post object). In Java, this is called a static method and you create it by prefixing the method definition with the keyword "static". In Python you use the @classmethod annotation.
In Scala, you can create static methods (and static properties) with companion objects. The companion object holds the statics while the class holds the instantiable methods/properties.
A companion object should be written in the same file (technically, the same compilation unit) as the class. Basically, put the companion object directly after the class in the file.
Like this:
class Book (val name: String)
object Book { /* whatever */ }
Inside the object, just add your static methods like you would to any other object. Unlike Java, you don't need to prefix static methods with "static".
Compile the class and companion object together.
When reading Scaladocs (like Databinder's), you need to keep the companion object distinction in mind. The reason you don't see statics in Scaladocs is because statics aren't anything special - they are just companion objects.
Introspection
If you use Ruby, you often want to see what methods exist on a class or object, so you do something like `"foo".methods`
In Scala, you can call 'getMethods' on a class, meaning you can do something like this:
"foo".getClass.getMethods.foreach(println _)
Here, the string "foo" is getting instantiated - Scala will figure out it is a string, then ask the instance to return the class. getMethods returns an array of the methods from the class. foreach loops across the array and (println _) prints 'em out.
But what if you don't want to instantiate the class to introspect on it's methods?
classOf[String].getMethods.foreach(println _)
getMethods returns an array of java.lang.reflect.Method objects (as opposed to Ruby which just returns an array of strings). The Method class is useful as you can use it to see what methods return what types, or take particular types as arguments. getReturnType() returns a Class object, and so you can sort the methods using the class of the return type, for instance.
Here is an example of reflection I used when writing the XML parsing section:
(doc\\"h1")(0).getClass.getMethods.filter(_.getReturnType() == classOf[String]).foreach(println _)
This takes the first h1 element in the document 'doc', gets all the methods one can call on that class, filters them to only those whose return strings and print those out. This way, I can figure out how out the 'get back a text node as a string' method - it's just called 'text'.
2.8.0 note: the REPL in the forthcoming 2.8.0 release has tab-complete for existing variables. You can't do "foo".[tab], but if you do var foo = "foo", you can then do foo.[tab] for bash-style tab completion for methods.
See the StackOverflow question Scala: How do I dynamically instantiate an object and invoke a method using reflection?
Duck typing is now structural typing
If you are coming from Ruby, Python or other dynamic languages, you may be used to duck typing - this is basically using the methods defined on the object as your type system rather than types being declared Java/C style. This is done by following existing method naming practice. So, in Ruby, if the object can be serialised as a string, you use "to_s". If it can be turned into XML, "to_xml", if you can iterate over it, you include the enumerable methods like "each". If you want to know if something is enumerable, you check it by doing "foo.responds_to?(:each)". This lets you write more flexible code - your code simply needs to check whether it takes whatever the method is that returns the result you are interested in, rather than you being too worried about what class it is.
In Scala, you may think, it's an old-fashioned static typed language like Java so you've gotta declare classes or - worse - interfaces. How boring. No sexy duck typing. Or, worse, you've got to specify Any (Scala's equivalent of Java's Object) as your type parameter - because you want to let it take anything, then just put in your documentation exactly what sort of object you need to pass it. This sucks.
Fortunately, you can do the equivalent of duck typing in Scala using structural typing. A structural type is a type defined by a set of what the specification calls 'refinements'. The refinements can be either methods or properties. A structural type can be declared using the type keyword or inline. An example:
def postToBlog(content: { def toHtml(): HtmlFragment }) = ...
This method takes as the first argument any object which matches the method's signature: it has to to be called 'toHtml' and return an (imaginary) 'HtmlFragment' object instance. The structural type is a value that you can assign a global name using the type keyword:
type Bloggable = { def toHtml(): HtmlFragment }
You can now use this as a type in the same way you can use classes as types (and ought to treat them syntactically in the same way you would classes IMHO - no horrible Hungarian notation like TBloggable or TypeBloggable or whatever - if you have a class and a type with the same name, that would seem to be a design problem):
def postToBlog(content: Bloggable)
Why would you want to do this? Imagine if you were building a social network like Facebook. On there, you can "like" all sorts of different things: comments people make, posts, photos, pages representing musicians, applications. If you had an existing application that used Java, and you were adding Scala to it, you might not necessarily be able to go and declare that all those things conform to some 'Likeable' interface, but if they all expose a common method, you can use that as a type.
Structural typing is also useful if the type you want to check - that is, the existence of a particular method or property - isn't something that you want to necessarily declare an interface for because it's something you are only using a small number of times. In that situation, there is no point declaring an interface (abstract trait) and cluttering up your Scaladocs for just one check.
Concurrency
There are a bunch of different ways you can do concurrency in Scala. Because Scala is just Java under the bonnet, you can use Threads, Runnables and the usual Java threading architecture. You just write your thread classes in Scala instead. Scala doesn't have a synchronized keyword though - but that doesn't matter because synchronized is a method instead. If you are happy doing concurrency in the old-school way, go ahead.
Alternatively, you can use actors. For those coming from Ruby or Python, think EventMachine, Twisted, node.js etc. Scala's actors are similar to Erlang's actors. Message passing, no shared state. It's pretty lightweight and easy to do. There's about a gazillion tutorials out there. Type some combination of 'scala', 'actors' and 'tutorial' into Google and you'll bump into them. Also, the big Scala book by Odersky et al. has a tutorial.
There's more though. Akka extends actors to add software transactional memory (like Clojure!), events and a whole bunch of other coolness. Apparently. I haven't tried it, but I hear it is cool.
I'm a big fan of HawtDispatch. It isn't only for Scala - if you can tolerate verbosity, you can use it in Java. It implements Grand Central Dispatch on the JVM. Grand Central Dispatch is Apple's way of doing concurrency in C, C++ and Objective-C. It adds blocks to those languages which execute asynchronously on a system-wide thread pool. HawtDispatch does similarly for Java and Scala. The Scala API is 2.8 only, but you can use the Java API from 2.7.
I've found that a mixture of HawtDispatch and actors give you most of the concurrency you need when hacking something together. The syntax used for both is pretty simple and easy-to-use.
Testing
I use Specs - it is BDD-style testing, very much like RSpec. There's a lot of different testing frameworks, so try them out. You can also use JUnit if you want.
I need to look into Ostrich for performance testing.
If you are using Specs, it comes with built-in support for Mockito. I've found Mockito - even with Specs' wrapper - to be confusing and counter-intuitive. Instead, my preference is to use traits in the src/test files to extend the objects and override the methods that would require mocking. I've put up an example on Gist.
Concurrency testing
Twitter has created xrayspecs which adds extra magic to Specs - specifically, concurrency testing and time testing. I've tried to download xrayspecs and compile it - Ant just gets to a certain point and fails. It is due to a bug - the build script depends on Java 5 (i.e. the one which comes installed on Mac OS X Leopard; Snow Leopard uses Java 6).
But, you don't need to worry because the primary thing you need xrayspecs for is concurrency testing. If you are using an up-to-date version of Specs, it has limited concurrency testing with 'eventually'. It works like this:
someObject.addThisToMutableAsyncCounter(2)
This call doesn't block - it modifies some mutable state of the object, but does it using something like a dispatcher to do background processing on a queue. Therefore, the method call is going to return as soon as it has added the thing to the queue.
To test this, you use this kind of matcher:
someObject.counterValue() must eventually(be(2))
This continues testing the call 'counterValue' repeatedly until it returns the value that it needs to (2). You need to make sure that counterValue() doesn't mutate state because that'll be called repeatedly.
Scala 2.8 is comin'!
At the time of writing, the Scala community is getting ready to release 2.8.0. RC5 is already out, and it's getting more and more polished. See What are the biggest differences between Scala 2.8 and Scala 2.7? - especially the answer "Taking the Leap" which describes how you can convert existing 2.7 code to 2.8 using compiler options.
Practice
Prerequisites
Using Scala, there are two ways you need to go about getting libraries. First, there's sbaz - Scala Bazaar. But not everything is in sbaz. In fact, most libraries you'll want to use aren't. So you need to get them as JAR files. You can either do this manually, or you can use a build tool like Maven, Ant+Ivy, Buildr or sbt (my preference - see the next section).
sbaz (output of `sbaz installed`):
- base/1.9
- commons-logging/0.0
- httpclient-4/0.0
- httpcore-4/0.0
- log4j/1.2.13
- sbaz/1.25tmp
- sbaz-setup/1.0
- scala/2.7.5.final
- scala-devel/2.7.5.final
- scala-library/2.7.5.final
- scala-tool-support/2.7.5.final
- uncarved-helpers/0.3
JARs - this is from my ~/code/classes/:
- antlr-2.7.5.jar
- commons-beanutils-1.8.2.jar
- commons-codec-1.3.jar
- commons-collections-3.2.1.jar
- commons-lang-2.4.jar
- commons-logging-1.1.1.jar
- commons-logging-api-1.1.1.jar
- concurrent.jar
- dispatch-http_2.7.7-0.6.6.jar
- dispatch-json_2.7.5-0.6.6.jar
- ezmorph-1.0.6.jar
- httpclient-4.0.1.jar
- httpcore-4.0.1.jar
- icu4j_3_4.jar
- iri.jar
- jena.jar
- jenatest.jar
- json-lib-2.3-jdk15.jar
- json.jar
- junit.jar
- log4j-1.2.12.jar
- log4j-1.2.15.jar
- lucene-core-2.3.1.jar
- stax-api-1.0.jar
- wstx-asl-3.0.0.jar
- xercesImpl.jar
- xml-apis.jar
You'll no doubt find that Commons and Log4j will help you about as much as they get on your nerves.
Build, config and deployment
Some Scala users use maven. I think it is too heavyweight.
Consider sbt or Apache buildr instead - build scripts in the former are written in Scala, while the latter are written in Ruby (with JRuby). sbt works great. Use that if you can - it supports Maven repositories and is pretty easy to extend.
If you use sbt, I've written some little scripts for sbt (in Ruby and Python) called sbt-growltest and sbt-notifytest which give you Growl and libnotify (the GNOME equivalent to Growl) notifications upon build and test success - they work with sbt and Specs, but could be adapted to work with other test frameworks.
Configgy is a library for configuration files - it is an alternative to the Java Properties library or using XML or YAML or whatever. I've heard great things about it, but haven't yet had a chance to test it.
Shelling out
One area where Java is notoriously annoying is passing commands to the native OS - shelling out, in other words. It's infuriatingly annoying, but it does mean you aren't ever tempted to do it. This is no bad thing.
Consider:
import java.io.{BufferedReader,InputStreamReader}
var br = new BufferedReader(new InputStreamReader((Runtime.getRuntime().exec("ls").getInputStream()))
br.readLine()
If you feel the need to shell out to an external process, reconsider. You can probably find a Java or Scala library that'll do what you want.
But there are easier ways. If you have sbt on your classpath (hint: as described earlier, you should be using sbt. It's Maven for sane people.), you can use sbt's process management stuff - see their wiki page
It's as easy as:
import sbt.Process._
val vimLocation: String = "whereis vim" !!
Scala 2.8 apparently also introduces a new IO class that makes process management simpler.
Talking to the Web
Obviously, the first thing you need to be able to do is talk to the Web. That means using HTTP.
There are three ways of doing this at the moment:
Also see the IBM developerWorks article Scala and XML.
Using the Dispatch library we installed using sbaz, let's get something off the web:
import dispatch._
import Http._
var http = new Http()
var helloscala = http("http://tommorris.org/files/helloscala.txt" as_str)
Talking to OAuth
Databinder Dispatch has an OAuth module. It works well, but isn't particularly well-documented.
I put up a copy of a REPL session to show how to authenticate and use the FireEagle API using Dispatch's OAuth code.
Parsing XML
You can parse XML natively in the language using the scala.xml functionality. Scala is one of the few languages where it is actually easier to use XML than it is to use JSON or YAML or other lightweight serialization formats.
import scala.xml._
val string = /* whatever */
val doc = XML.loadString(string)
(doc\\"h1")(0).text
If you need to write XML, note that XML is a language literal. Try it. It's really cool. Open up a Scala REPL, type in "val foo = " then paste in a fragment of XML. Most of the Scala books have a chapter on XML literals and the scala.xml libraries.
Parsing JSON
If you want a native Scala library, try scala-json. Alternatively, use json.jar from Java.
My preferred way of parsing JSON is using json-lib. Here is how you parse json-time using Scala.
First, classpath. My sbaz is as above, and I add these jars to the classpath:
- commons-lang-2.4.jar
- json-lib-2.3-jdk15.jar
- ezmorph-1.0.6.jar
- commons-collections-3.2.1.jar
- commons-beanutils-1.8.2.jar
Here's the code:
import dispatch._
import Http._
import net.sf.json._
var http = new Http()
var jsontime = JSONObject.fromObject(http("http://json-time.appspot.com/time.json" as_str))
Now we can interrogate the object a bit:
jsontime.get("tz").asInstanceOf[String]
jsontime.get("hour").asInstanceOf[Int]
(Obviously, if we are going to use date/time objects, we should probably import JodaTime and scala-time and cast the objects returned from the JSONObject into the relevant date-time objects.)
Twitter and other APIs
Of course, what you really need is some more Twitter in your life. Using the HTTP and XML/JSON libraries listed above, it should be fairly easy to poll Twitter's APIs. But polling is old school. The new kids on the block use the firehose-style realtime APIs. For that you need acrosa's Scala-TwitterStreamer. It is based on Apache Commons HTTP and the code is very Javaish rather than FPish, but it works. Sadly, it doesn't actually come with a thing to turn the InputStreams of JSON into objects, but it isn't too difficult to write.
Other web service API wrappers (mostly Java):
- Delicious - delicious-java
- Digg - Jigg
- FireEagle - JFireEagle (beware: this doesn't seem to work. Writing a Scala FireEagle library based on Databinder Dispatch is something I intend to do quite soon.)
- Flickr - FlickrJ
- Foursquare - closest I've been able to find is the Java code used by the FourSquare Android app to access the API
- geonames - java api
- Google Charts API - charts4j
- Google GData - gdata-java-client
- Gowalla - gowalla-java
- Last.fm - Scala-Last.fm, last.fm API bindings for Java, lastfm-java
- Upcoming - Upcoming
Semantic Web
A lot of interesting data is out there in RDF. One of the real benefits of Java if you are a Semantic Web user is that you get to use Jena, perhaps the most mature and standards-compliant RDF and SPARQL implementation. There's a Scala wrapper for Jena called scardf which adds a layer of functional domain-specific language on top of Jena.
Date Time
Rubyists have Chronic. There's Parsedatetime for Python. These are great because they allow natural language date parsing. Java and Scala users get JChronic.
Java also has JodaTime. Use that. Don't use the date-time libraries in Java - JodaTime is better.
BigInt
One thing people coming from Python/Ruby etc. need to remember is that you have to deal with the type system's handling of numbers. For integers and longs, you can use the literals provided by the language. Ints can be created by just using the integer literal - "var a = 5" will create an Int object and assign it to var a. To create a long, you use "var a = 5L". Suffixing the number with 'l' or 'L' (upper or lower case) will create a long. Floats can be created by suffixing with a lower or upper case 'F'.
For Big Ints - integers too big to fit into Int or Long - you should use BigInt. BigInt is like Java's BigInteger class, but easier.
In Java, you might do this: "BigInteger whatever = new BigInteger("82175986439779345802385989");"
In Scala, you can use BigInt, which is a class and has a helper object that lets you instantiate just like this: var a = BigInt("82175986439779345802385989")
You needn't instantiate using a string - you can use an int or a long.
Once you've instantiated, the key benefit of using Scala's BigInt class rather than Java's BigInteger class is simple: you can use the standard mathematical operators - +, -, * and /
So use that.
For number type conversion, you can use asInstanceOf. So if you have an Int and you want to turn it into a Long, just do something like: 5.asInstanceOf[Long] (Of course, with Int, it defines a toLong method.)
It's worth learning about asInstanceOf. Google it.
Web frameworks
Some web frameworks you may want to investigate:
- Lift
- Step is more lightweight like Ruby's Sinatra framework
- Pinky
- SweetScala
- Gardel
- Play - very similar to Rails for Java and Scala. Very beginner friendly.
Other resources
- Daniel Spiewak writes a lot of good stuff about Scala including Code Commit, a blog which covers Scala a lot - it's technical but always interesting. He's also behind the Style Guide (PDF).