February 2008 – Raised Weird

Audio-Technica ATH-ANC7 vs. Sennheiser PXC 250

I do a fair amount of air travel; in 2007 I flew about 50,000 miles without leaving the U.S. As a human factors guy, I know that the airplane rumble contributes to feeling fatigued as a result of air travel. I don’t like having things in my ear canal (I apparently have some skin condition in my ears and they always irritates the skin in my ears) so IEMs (that’s “in-ear monitors” for the non-headphone crowd) are not a good option for me. So, active noise cancellation (ANC) was a possibility. I wasn’t sure about all this but years ago I got some cheap Aiwa noise-cancellers as a gift and I’ve never looked back, active noise cancellation is for me.

So, I’ve owned, and been reasonably happy with, the Sennheiser PXC250s for some time. I got them on sale at MacMall for $90 back in 2003, which was pretty much a steal at that time. That was the only PXC model at the time. The PXC 250 is simply the Senn PX200 with Sennheiser’s active noise cancellation.

A little while back, Audio Technica released a new entry into the ANC party, the ATH-ANC7. Sennheiser also release some new players like the PXC 450 (which I understand to be the drivers from HD555 in closed form with Senn’s ANC circuit). The PXC 450 is a bit on the spendy side and has gotten some negative reviews, so I wasn’t all that interested in them, despite the fact that I really like the Senn HD 555/595 sound signature. I asked for the ATH-ANC7s for Christmas and my wife obliged.

So, now I’ve had them for a couple months and they’ve been on a few plane trips (including two transatlantic flights); how do they stack up against my tried-and-true PXC250s?

I should preface this by saying that Sennheiser and Audio-Technica are currently my favorite headphone manufacturers. My main cans at home are Senn HD595s (the better 120-ohm model), but when I need a closed can, I go for my ATH A700s. So this is a shootout between two companies I’m predisposed to liking.

I’ll face them off on a number of attributes. However, first, details on my usual air travel rig:
• 5.5gen 80G iPod video
• SendStation lineout dock to bypass Apple’s attenuator
• Xin Supermicro amp
• Custom mini-to-mini cable (made by Norm of “Vibe” amp fame)

Portability
This one goes hands-down to the PXC250, no contest at all here. They fold up into a pretty small little package and Senn provides a nice zippered cloth case which also has a zippered pocket on the outside, which is perfect for the dock adapter and SuperMicro. The ANC7s fold kind of flat and also have a nice zippered hard case, but the thing is probably almost three times the size of the Senn case and doesn’t have as nice a pocket.

Noise Attenuation
This is more or less a tie. Due to their small size, the PXC250s don’t block out very much sound without ANC engaged. This means they don’t muffle mid-range and higher sounds very well. The ANC7s are much more substantial and do a better job with that. However, the Senn ANC circuitry is flat-out better than the ATH circuitry. With no audio being fed into the headphones at all, switching on the PXC250s just completely wipe out airplane rumble. The ANC7s do pretty well, but not as well as the Senns. Now, with any kind of music playing, the rumble is pretty well masked by the music so it’s not a big deal, but I know the Senns have the better ANC circuit. The ATHs, however, do a much better job with the screaming baby three rows back because of their superior passive noise cancellation.

Annoyance Factor
I’m not sure what other term to use for this. Basically, I want to give a point to the ATHs because the physical setup of the Senns is kind of irritating. The ATH is physically bigger, and the over-ear unit also houses all the electronics and the battery. Thus, you can put on the ATHs and have no cables coming from them at all, so if you just want the noise attenuation and don’t want to listen to anything else, you can do that without having to mess with anything else. The ATHs also take a standard mini-to-mini cable so you cable freaks can recable if you want, or anyone can replace the cable if it breaks. The Senn setup is not so slick. The cable that runs out of the ear cups goes a ways, then there’s a unit about the size of a candy bar which houses the batteries and ANC circuitry (including the mic), and then there’s a few more feet of cable ending in a mini plug. So when you’re wearing them you have to find a place to clip the candy bar thing. I have thus gone to wearing shirts with breast pockets when I fly just so I have some place to clip the stupid thing.

Comfort
This is again a no-brainer. The Senns are feather-light and very comfortable; I can wear these babies for hours without any discomfort. The ATHs are reasonably comfortable headphones, but they just cannot match the Senns. After about two hours I need to take breaks with the ANC7s. The place where this really matters is sleeping; I find it very hard to fall asleep with the ATHs on, but this is not a problem with the Senns.

Sound Quality
So, which one sounds better? The PXC250 is based on the PX200, which is just simply not one of Senns better-sounding cans; both the highs and the lows are a bit rolled off with the PX200 (yes, I own a pair of those, too). I think the PXC250s sound better than the PX200s, though. I know it’s the same driver but the PXC250s are a little more lively. The highs have more sparkle and the bass is not so weak, though the mids are still emphasized. The ANC7s have the Audio-Technica sound signature to them, which means the mids are somewhat recessed overall. Overall I think the ANC7s sound a bit better; they’re a little more detailed and involving than the PXC250s and definitely have more thump for you bass-heads out there, though I would give a slight edge to the Senns for classical and acoustic music.

It should be noted, though this should be a surprise to nobody, that both of these sound dramatically better than any Bose NC product and they cost less. I’ve listened to both the QC2 and QC3 in stores and they both sound like crap, especially for those prices.

Driveability
That’s probably not a word, but you headphone geeks know what I mean. The Senns are 300-ohm cans. With ANC on, they’re not quite as quiet as that, but even still, they aren’t all that easy to drive. With ANC on and no amp, putting the iPod volume on max still isn’t all that loud. This is why I got the Supermicro in the first place. The ATHs, on the other hand, while also not super easy to drive with ANC off (260 ohm), are much better behaved unamped with the ANC on than the Senns. They do improve when amped, as most headphones do, but not as much as the Senns. In particular, the ANC7s are adequate for watching movies without an amp, where the PXC250s need one even for that.

The Bottom Line
So, which headphone do I like better? I’m kind of on the fence, as there are obvious tradeoffs. I think in the future for transatlantic flights I’m taking the PXC250s because of the comfort and size advantages. (Coming out of the UK you only get one carry-on total, and that includes laptop bags, so space is at a bit of a premium, and I’m rather hoping to make trips to the UK more regularly in the future.) But for domestic flights I’ll probably go with the ATHs.

Postscript: What is ANC?
This is for those of you who don’t know anything about active noise cancellation. ANC is a pretty cool idea. Basically, there’s a microphone on the outside of the headphone which listens to the noise around you. A circuit takes that input, inverts the phase, and feeds that into the audio signal passed into the headphones. The result is that the outside noise is “cancelled out” of what you hear. Phase inversion is a little tricky and only really works at low frequencies. So, for instance, ANC doesn’t really block out voice all that well, particularly high-pitched voice. The good news, though, is that it’s the low frequencies which have an easier time passing through solids. (You know, when someone’s got the music too loud in their car with all the windows closed, all you can hear is the bass; the low-frequency stuff.) ANC is generally very good with any kind of constant, low-frequency rumble, like jet engines.

Net Neutrality

For reasons relating to my voting research, I generally don’t make political blog posts. I don’t see wanting secure and usable voting machines as a partisan issue, but I know that some people do, so I try to stay away from politics in my blog.

Here I’m going to make an exception. Again, I don’t see this as a partisan issue, and while I’m sure that many do, I really hope policy-makers can rise above the partisan fray here.

I just cannot see the logic behind ending net neutrality. I guess the leading argument has something to do with “free markets.” And while there are often occasions, where that’s a good argument, this is not one of them. The telecom market is not a free marketplace! First, all the basic R&D and infrastructure supporting the relevant market was paid for with public money. Yes, the telcos have put their own money into it since then, but the fact remains that without the public money at the beginning, we wouldn’t be having this discussion in the first place.

Second, the telcos are effectively regulated monopolies and these markets aren’t really “free” at all. The barrier to entry in these markets is enormous. At my home, I have the “choice” of a whole whopping two broadband providers; DSL through SB, err, AT&T, and cable modem via Comcast. Except that I cannot bundle this with my phone service, because Comcast doesn’t do VOIP to me (I don’t know why). So, really, I have one choice. How is that a competitive marketplace?

What’s interesting about this is that European markets—you know, Europe, that den of free-market everything, like medicine—is they’re actually much more competitive in telco services. I was just in London and I regularly saw ads for probably five different broadband providers, and if you live in the U.S., you would not believe the prices and the service levels offered. 20 megabit fiber for less than I pay for 3 megabit DSL. (This is particularly amazing since everything else in London was much more expensive than here at home, and this would be true even if the dollar weren’t in the toilet.)

Mobile phone service was like this as well. In most of the U.S., there are four choices: Verizon, SprintNextel, T-Mobile, and AT&T. There are easily twice that many across the pond, and the service plans are much cheaper. (I think I saw an ad for 1000 peak minutes/month with unlimited SMS for £15/month, which is about $30). Pre-paid is much more popular there, too, and you couldn’t walk two blocks in London without finding a place where you could buy more minutes for your service. So maybe there is something to that competition thing—I wish we had it here.

So, maybe repealing net neutrality could work in Europe where there is a much freer market. Except that I think the Europeans wouldn’t stand for giving providers the authority to block or slow down traffic they didn’t like.

And, frankly, neither should we. Save the Internet, support Net Neutrality. I’ve written my representative, have you?

Death to the Mann-Whitney U and its Allies!

WARNING: Statistics content ahead. Not my usual blog-fare.

In the early 20th century, a guy named Fisher figured out that the if you wanted to know if two (or n) samples came from populations with equal means, you could solve this problem by looking at all the different possible assignments of data to the groups and seeing how likely it is that the difference in means you observed could have happened by chance. (This is called a randomization test or a permutation test.)

The fundamental problem with this was that if you have even a relatively modest number of observations in even two groups, the number of possible assignments of observations to conditions grows very quickly as a function of the sample size. So, for instance, if you have 10 data points and 2 groups, you have to evaluate 252 possibilities (this was before computers, so this would have been tedious but do-able), but if you have 20 data points and two groups, you have 184,756 possibilities; not tractable with pencil-and-paper.

So, realizing this was not going to be tractable, Fisher invented the ANOVA (which pretty much puts him in the Smart Guy Hall of Fame), which is an ingenious bit of mathematics which makes use of the idea of sampling distributions. The ANOVA (and here I refer to only the between-subjects ANOVA; things are more complicated in the within-subjects case) and methods like it (for 2 groups the t-test is equivalent) do require some ancillary assumptions, namely:

[1] Equal variance in each group. This is because the within-groups error term is only a valid estimate of population variance when there is only one population variance to estimate.

[2] Normal sampling distributions for the parameters being estimated, namely the mean and the standard deviation. (Note: Or variance if you don’t like standard deviations. The variance will actually be a chi-square, not a normal, but the basic idea is the same.)

[3] Independence of errors. Generally speaking this is not an issue in between-subjects designs as one subject’s score doesn’t affect the other, but there may be some issues with condition effects. For now, I’m going to ignore this one.

Despite the fact that corrections exist, people violate assumption 1 to varying degrees more or less routinely. This is probably mostly OK since, as long as the sample sizes in the groups are roughly equal, all this usually costs you is power. (Everyone appears to be taught the adjustment in the t-test for violation of this assumption but for some odd reason this is uncommon for the ANOVA. SPSS has the same idiosyncrasy; it prints out the adjustment for an independent-samples t but not for a between-subjects ANOVA. This makes no sense; apparently JMP does this better—good for JMP!)

Anyway, let’s look at assumption 2. I have noticed that there appears to be something of a misconception about this assumption. You will, in fact, see this assumption stated incorrectly (even in statistics textbooks) that the distribution of the population data must be normal. While in some sense this is technically true, that’s really not the best way to think about it. The assumptions in the ANOVA (and related tests like the t-test) are about the sampling distributions of the relevant statistics, not the population distributions. So, why is this important?

This is important because it means that the ANOVA is generally (though of course not always) a lot more robust than many people seem to believe. Your sample data need not be perfectly normal in order to use an ANOVA on them. Why not? Because what needs to be normal are the sampling distributions of the mean and the variance. And, in fact, the sampling distribution for the mean will indeed be normal, or at least very close to normal (assuming reasonable sample size), always. How do we know this? By the Central Limit Theorem. (If you don’t know what that is, then go back and re-read your intro statistics textbook; turns out that was one of the really important bits.)

OK, so how about the standard deviation? It’s true, there is no Central Limit Theorem for the standard deviation. We do know that if the population distribution is normal, the sampling distribution for the standard deviation will be normal. This is, I believe, the source of the misconceptions. Many people thus believe the population data must be normal or it really messes up the sampling distribution of the standard deviation. Technically speaking, it is correct that if the population distribution is non-normal, the sampling distribution of the standard deviation is not guaranteed to be normal.

So, therefore, if your sample data are non-normal, you cannot use the ANOVA, right? This belief certainly permeates at least psychology and HCI, because I keep getting papers to review where people behave this way.

First, there is a problem of inference from sample data. Just because your sample data are non-normal does not guarantee that the population from which you are sampling is non-normal. You have access to a sample, not a population, and the ANOVA assumption refers to the population. So you cannot automatically assume the ANOVA is invalid based purely on the shape of your sample distribution. Unfortunately, there is no way to know for sure if your skewed sample means a skewed population.

However, even if you could know, the situation is just not that clear-cut. It turns out that the sampling distribution of the standard deviation tends, in general, to be very normal-like, even for pretty funky population distributions, as long as the sample size is reasonable. I would encourage you to go to David Lane’s Hyperstat site and play with the Sampling Distribution demo therein. Try really wacky distributions, and take samples of 20,000 or more. Notice how normal the sampling distribution for the standard deviation is, even with an N as low as 16? This is, of course, an eyeball test and those often miss the finer points, and sometimes those finer points matter. Yes, it is possible to generate sampling distributions of the standard deviation which are sufficiently non-normal enough to mess up the ANOVA. But it’s not easy. And, as it turns out, mostly what gets screwed up when you violate this assumption is the power of the test, not the Type I error rate. (Which we all know must be preserved at all costs, even though people routinely perform multiple tests without adjustment. Sorry, that’s a topic for another day.)

So, what does this mean? Can you always use an unadjusted ANOVA? No, of course not. If your data are highly non-normal (badly kurtotic or skewed), have unequal variances, and different sample sizes, then you have a problem. But if all you have is, say, skewed data, your problem might not be that bad. To see this illustrated, check out the Robustness demo from the Hyperstat site: take two “severely” skewed populations with equal standard deviation and you’ll notice that Type I error rate is nearly always preserved. (As Box observed in the 1950s; see Box, G. E. P. (1953). Non-normality and tests on variances. Biometrika, 40, 318–335.)

However, no matter how bad your data are, the solution is not the Mann-Whitney U test or similar procedures. At least not anymore.

Why not? Essentially, what these tests do is throw away data. More precisely, what they do is transform your data into ranks (throwing away the interval information), then perform a randomization test on the ranks, because there is a closed-form (i.e., analytic) solution to the randomization test when using the number sequence 1..n. It’s a crappy, low-power alternative, (when the ANOVA assumptions are met, anyway) but it indeed does not make the kind of distributional assumptions made by the ANOVA. But am I saying it should not be used even when the ANOVA assumptions are violated? Yes, I am.

While the Mann-Whitney U and its cousins were certainly reasonable things to do in 1956 when Siegal wrote his still-popular book on non-parameteric statistics, I cannot imagine what the justification is for doing this kind of test in 2008. If you’re worried enough about distributional assumptions that you refuse to do the standard parametric test (which is fine but not necessary in many cases), then do the bloody randomization test on your raw data! Or do some boostrap-based variant. We have computers now—fast ones—that can resample your data 20,000 times in the blink of an eye (in 2008, even a cheap laptop can do this). These resampling techniques also do not require any distributional assumptions about your data, but they are clearly more powerful than old-school non-parametric methods like the Mann-Whitney U. If Fisher had access to a fast computer in 1915 he probably never would have bothered with the ANOVA in the first place. We now have fast computers, so why on earth should we bother with watered-down alternatives which are simply cheapened versions of the randomization test that Fisher himself recognized as appropriate?

So, if I’m reviewing your manuscript and I see a Mann-Whitney U or anything of its ilk (Kruskal-Wallace, Friedman, etc.), I’m rejecting it (well, OK, maybe just giving it revise-and-resubmit if everything else looks good)—unless, of course, what you actually have is rank data. Since people rarely have rank data, you then have alternatives: either realize the the ANOVA is at least somewhat robust to assumption violations and go ahead and use it, or do a randomization or other resampling test. (I’m sure Rand Wilcox would argue for some other form of robust test; one of those would be fine as well.) The U and those like it are a wishy-washy bucket of “neither” and, given modern computing power, should be purged from statistics courses and textbooks. And you old-school statistics instructors out there: stop teaching these procedures as appropriate! You can teach them for historical context so your students know what they are when they see them, but stop telling students to actually perform these tests! They had their day in the sun, but that day has been over for a good twenty years now. Teach them permutation or bootstrap tests instead.

(Thanks to David Lane for helpful comments and suggestions on an earlier draft of this.)

Ahh, So That’s It

Not exactly what I’d call air-tight (beer-tight?) research but I think this piece might explain why I had so few journal articles in graduate school.

Farewell, Old Friend

Not that anyone else will care, but Monday I did something I’ve been both dreading and looking forward to for a while: I changed email clients. Why is this a big deal? Well, I probably spend as much time in my email client as in any other application, so my mail client impacts my day like few other changes in my electronic world.

I just switched from BareBones’s Mailsmith to Apple’s Mail (I’ll go into the reasons for this later). Making a change like this will always include transitional pain. What surprises me is that I can’t find any site with stuff to ease the change. Keyboard shortcuts are all different, but those can be re-assigned pretty easily. But some of the other stuff is harder. In just two days, I already have some strong impressions.

Stuff I already like better about Mail:
• I’m running Leopard, and Mail can use QuickLook on attachments. I like QuickLook in the Finder, and I didn’t think of this before I saw it, but email is the ultimate place for it. It might have been worth it for this feature alone. Mail seems to handle attachments better in other ways as well.
• IMAP support. Someday in the not too distant future, I’ll be getting an iPhone, and that means I’ll really need IMAP.
• Speed. Mailsmith had become such a dog, especially on Intel hardware. I love being able to click a column header to sort by that and have a mailbox with thousands of messages sorted immediately, or mark 50 messages as read and not have to wait for the spinning beach ball for 20 seconds.
• Better integration with other apps. For example, I use OmniFocus, and it plays very well with Mail. And SpamSieve now seems to play as well with Mail as with Mailsmith. (Actually, better, since the spam mailbox can now be sorted by spam probability, which is great.)
• Dynamic spell checking.

Stuff I already miss:
• Mailsmith’s text editing environment. Obviously, the Mailsmith editor is based on BBEdit, which means it rocks. What I miss most of all is the “Rewrap” command, which re-wraps the selected bit of text and maintains the quote level. Years ago—before Mailsmith existed—I used Eudora, and had an Applescript which copied the current text, opened it in BBEdit, and then I used another script to bring it back to Eudora. I would just do that again, except for:
• Applescript support. Mail has some, but not nearly as much as Mailsmith, which is the king of scriptability. Some of the things that are gone don’t surprise me, but others boggle my mind. In particular, Mail does not provide Applescript access to the contents (the text) of an email message that is being composed, nor access to the message associated with a window. This is totally indefensible—hey, Mail team, what gives? How could you not want this property?
• Annotation. Mailsmith supports things like notes on emails and coloration under script control, which allows all kinds of tagging that is hard to do in Mail.

I hate rich text/HTML email and thought I would miss Mailsmith’s auto conversion into plain text, but Mail actually handles this pretty well.

The other thing that surprises me is that I can’t find any sites devoted to making this transition. Surely I can’t be the only one who has dumped Mailsmith for Mail? I know at least John Gruber did it—anyone else?

Postscript
Why the change? I’ve been using Mailsmith since before version 1.0 (I was a beta tester for it), which works out to something on the order of a decade. I’ve always really liked the way the BareBones folks have conceptualized the task of dealing with email. But in the last year or so, I’ve developed more of a love/hate relationship with it. It hasn’t been updated in years (May 2005) and isn’t Intel-native. It’s dog slow, and the lack of updates means that it’s fallen behind the technology curve pretty badly. I didn’t dump it last summer because last April BareBones started a public beta test, but the betas have been, well, betas. I tried several iterations, and I was never able to successfully migrate to it, and of course it has stability issues, and I hate losing email. Also, there’s the IMAP thing. But it’s definitely weird to make the switch.

Super Bowl XLII

OK, so here it is, the big game. The spread moved around a bit but seems to have settled:

New Jersey Giants vs. New England Patriots(-12)
Yet another double-digit spread for the Pats. It’s been a long time since they’ve covered one of these, and their previous Super Bowls were all close ones as well. I’m taking the points.

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29