Clojure from the ground up: functions

2013-11-03

We left off last chapter with a question: what are verbs, anyway? When you evaluate (type :mary-poppins), what really happens?

user=> (type :mary-poppins)
clojure.lang.Keyword

To understand how type works, we’ll need several new ideas. First, we’ll expand on the notion of symbols as references to other values. Then we’ll learn about functions: Clojure’s verbs. Finally, we’ll use the Var system to explore and change the definitions of those functions.

Continue reading (2974 words)

Clojure from the ground up: basic types

Software Clojure Clojure from the ground up

2013-10-26

We’ve learned the basics of Clojure’s syntax and evaluation model. Now we’ll take a tour of the basic nouns in the language.

Types

We’ve seen a few different values already–for instance, nil, true, false, 1, 2.34, and "meow". Clearly all these things are different values, but some of them seem more alike than others.

Continue reading (4019 words)

Clojure from the ground up: welcome

Software Clojure Clojure from the ground up

2013-10-26

This guide aims to introduce newcomers and experienced programmers alike to the beauty of functional programming, starting with the simplest building blocks of software. You’ll need a computer, basic proficiency in the command line, a text editor, and an internet connection. By the end of this series, you’ll have a thorough command of the Clojure programming language.

Who is this guide for?

Science, technology, engineering, and mathematics are deeply rewarding fields, yet few women enter STEM as a career path. Still more are discouraged by a culture which repeatedly asserts that women lack the analytic aptitude for writing software, that they are not driven enough to be successful scientists, that it’s not cool to pursue a passion for structural engineering. Those few with the talent, encouragement, and persistence to break in to science and tech are discouraged by persistent sexism in practice: the old boy’s club of tenure, being passed over for promotions, isolation from peers, and flat-out assault. This landscape sucks. I want to help change it.

Continue reading (2523 words)

The trouble with timestamps

Riak Distributed Systems Cassandra Clocks

2013-10-12

Some folks have asked whether Cassandra or Riak in last-write-wins mode are monotonically consistent, or whether they can guarantee read-your-writes, and so on. This is a fascinating question, and leads to all sorts of interesting properties about clocks and causality.

There are two families of clocks in distributed systems. The first are often termed wall clocks, which correspond roughly to the time obtained by looking at a clock on the wall. Most commonly, a process finds the wall-time clock via gettimeofday(), which is maintained by the operating system using a combination of hardware timers and NTP–a network time synchronization service. On POSIX-compatible systems, this clock returns integers which map to real moments in time via a certain standard, like UTC, POSIX time, or less commonly, TAI or GPS.

The second type are the logical clocks, so named because they measure time associated with the logical operations being performed in the system. Lamport clocks, for instance, are a monotonically increasing integer which are incremented on every operation by a node. Vector clocks are a generalization of Lamport clocks, where each node tracks the maximum Lamport clock from every other node.

Continue reading (1830 words)

Jepsen: Strangeloop Hangout

Software Distributed Systems Jepsen Zookeeper Cassandra

2013-09-28

Since the Strangeloop talks won’t be available for a few months, I recorded a new version of the talk as a Google Hangout.

Continue reading (29 words)

Jepsen: RICON East talk

Software Riak Distributed Systems Jepsen Redis MongoDB Postgres

2013-09-28

Continue reading (6 words)

Jepsen: Cassandra

Software Network Distributed Systems Jepsen Cassandra

2013-09-24

Previously on Jepsen, we learned about Kafka’s proposed replication design.

Cassandra is a Dynamo system; like Riak, it divides a hash ring into a several chunks, and keeps N replicas of each chunk on different nodes. It uses tunable quorums, hinted handoff, and active anti-entropy to keep replicas up to date. Unlike the Dynamo paper and some of its peers, Cassandra eschews vector clocks in favor of a pure last-write-wins approach.

Some Write Loses

Continue reading (3189 words)

Jepsen: Kafka

Software Network Distributed Systems Jepsen Kafka

2013-09-24

In the last Jepsen post, we learned about NuoDB. Now it’s time to switch gears and discuss Kafka. Up next: Cassandra.

Kafka is a messaging system which provides an immutable, linearizable, sharded log of messages. Throughput and storage capacity scale linearly with nodes, and thanks to some impressive engineering tricks, Kafka can push astonishingly high volume through each node; often saturating disk, network, or both. Consumers use Zookeeper to coordinate their reads over the message log, providing efficient at-least-once delivery–and some other nice properties, like replayability.

Continue reading (1881 words)

Jepsen: NuoDB

Software Network Distributed Systems Jepsen NuoDB

2013-09-23

Previously on Jepsen, we explored Zookeeper. Next up: Kafka.

NuoDB came to my attention through an amazing mailing list thread by the famous database engineer Jim Starkey, in which he argues that he has disproved the CAP theorem:

The CAP conjecture, I am convinced, is false and can be proven false.

The CAP conjecture has been a theoretical millstone around the neck of all ACID systems. Good riddance.

This is the first wooden stake for the heart of the noSQL movement. There are more coming.

Continue reading (1497 words)

Jepsen: Zookeeper

Software Network Distributed Systems Jepsen Zookeeper

2013-09-23

In this Jepsen post, we’ll explore Zookeeper. Up next: NuoDB.

Update 2019-07-23: @insumity explains that ZooKeeper sync+read is not, in fact, linearizable–there are conditions under which it might return stale reads.

Zookeeper, or ZK for short, is a distributed CP datastore based on a consensus protocol called ZAB. ZAB is similar to Paxos in that it offers linearizable writes and is available whenever a majority quorum can complete a round, but unlike the Paxos papers, places a stronger emphasis on the role of a single leader in ensuring the consistency of commits.

Continue reading (772 words)

A letter on NSA surveillance

Politics Security

2013-07-04

I wish I could make more concrete policy recommendations, but in this case all I can say is “this looks troubling.” Here’s the letter I sent to my representatives today:

Dear Senator Feinstein,

In 2006, we learned that the NSA had secretly tapped all internet traffic flowing through AT&T’s San Francisco peering point. Now, the Guardian’s leaks suggest that the NSA has accrued phone and email records–some metadata, some full content–for millions of US citizens, and stored them for targeted analysis. The criteria for retention and analysis remain poorly understood.

Continue reading (637 words)

Automating Jepsen

Databases Jepsen

2013-06-28

If you, as a database vendor, implement a few features in your API, I can probably offer repeatable automated tests of your DB’s partition tolerance through Jepsen.

The outcome of these tests would be a set of normalized metrics for each DB like “supports linearizability”, “available for writes when a majority partition exists”, “available for writes when no majority available”, “fraction of writes successful”, “fraction of writes denied”, “fraction of writes acked then lost”, “95th latency during condition X”, and so forth. I’m thinking this would be a single-page web site–a spreadsheet, really–making it easy to compare and contrast DBs and find one that fits your safety needs.

At a minimum, I need to know:

Continue reading (335 words)