There’s an interesting visual representation of the relationships between cars and manufacturer floating around the net. I think the original source can be credited to Too Many Cars, but who knows. In any case, it’s an interesting demonstration of just how few car manufacturers there are and how everyone of them has a piece of many different brands.
Archive for March, 2008
I didn’t grow up with Mysterious World playing the background. Nor did I grow up reading his stories, but now, as an adult who has a love technology, exploration, and imagination I find myself genuinely sad at the lose of a great man, Sir Arthur C. Clarke.
At the age of 90, Clarke passed away in is Sri Lankan home. He will be missed by many and honored by more.
Not long before his death Clarke recorded this video where he made some jokes, told some stories, and made three wishes. He wished for evidence of extraterrestrial life, a move to clean energy, and peace in Sri Lanka. With all three of those, I agree wholeheartedly. If you have 10 minutes to spare, I suggest you watch this moving final message to the world from a great man.
I got a great response to from Aaron to my “ Ethics of Web Crawling” post from yesterday and decided to find out where I stood legally on this issue. Luckily, my brother is a lawyer who happens to specialize in intellectual property law.
I’d like to stress that that is NOT legal advice, but just the opinion of one lawyer who hasn’t actually seen any of the relevant information. Please do NOT act on this advice. If you have a real issue, consult a lawyer.
That said, here are the legal issues.
1. Is this fair use?
Answer. Kind of. Fair use refers to the tenant in copyright law that allows someone to use copyrighted material without permission of the copyright holder for such things as reviews and academic pursuits. In this case, my use of the data I crawled would be for academic purposes and so would fall under fair use.
However, my brother actually thinks that the data do not fall under any copyright protection at all. He explained that you can not copyright data, only the presentation of that data. So for example, had I taken a screenshot of a website and published it, I would be in the realm of copyright. Would that be fair use? It depends on a few things that I won’t get into now.
In the case of web crawling, I am collecting data and presenting it in a very different (and actually aggregate) manner from which it was published. Because I’m not actually reprinting their exact material, I’m in the clear.
The analogy my brother used was that of writing a review for a book. Under fair use, I can quote that book in my review and be fine…but I would still fall under copyright laws (I would just be in the clear because of fair use). In contrast, if I reported that the book had 347 pages numbered 1, 2, 3…, 347 I wouldn’t be regulated by any copyright whatsoever (fair use or otherwise) since I’m reporting data, not content.
2. Did I violate their Terms of Service?
Answer: What TOS? Their TOS says that I can’t “copy, modify, publish, transmit, distribute, perform, display, or sell” any information from their site. Clearly by taking their data and publishing an analysis of it I would be both modifying and publishing. Fair use aside, the problem with their TOS is that it does not require for me to accept it. It is buried on a separate page that I only found after actively searching. Had I had to agree to the TOS upon arriving to the site, then maybe they would have a case (though the argument above would probably still hold). Because the TOS was not accepted by me, it does not actually apply.
Conclusion: I’m legally in the clear. Ethically, that’s a different question…one I’m still grappling with.
As a behavioral researcher I often find it interesting to look at real world data in order to supplement my experimentally derived conclusions. Not only does this lend a sense of credibility to any findings, but it also makes for a far more interesting and memorable story.
Recently I came across a website (we’ll leave it unnamed for the time being) that, because of the nature of the service they were providing, had a natural experiment running for the last few years that would test my hypothesis perfectly. I realized that access to these data would be incredible. If my theory could be borne out in the real world, I would be thrilled!
However, I also realized that due to the nature of the business, this website would likely be very reluctant to hand over their data. So I did what any reasonable person with a programming background would do—I wrote a web crawler and systematically collected all the data that was available to the general public. It turns out that my hypothesis was, in fact, supported by these real world data, so my effort was not in vein. But was I right to do this?
I’ve been struggling with the ethicality of this issue for a few days now and genuinely don’t have a good answer. On the one hand, the data are freely available to anyone who wants to view them. There is no registration required and the system uses a simple indexing method that allows for trivial crawling. On the other hand, if I publish a paper with these data I am going to reveal some information about this firm that they may not want out there. This company’s data are proprietary to them and by posting any of it on the web they are implicitly assuming some level of trust from their viewers.
Does the benefit to science outweigh the breach of trust? Honestly, I don’t know. I plan on contacting the firm and telling them what I did. My hope is that they will be excited by my conclusions and try and incorporate this new knowledge into their business. Unfortunately, I suspect that instead they will get defensive and demand that I not publish my results.
So what should I do? What do you think is the right course of action?
I’ve always asked myself this question, but it became particularly relevant when I heard about today’s anti war protest run by United for Peace. Don’t get me wrong, I’m generally against this war (though I appreciate the complexity of the situation we are in right now and don’t think that a simple “let’s just leave” solution is optimal), but I can’t for the life of me figure out why people would line up and protest the war today.
With only 31% of Americans supporting Bush and the war, what does this protest hope to accomplish? I see the value in hitting the streets when the cause is little known, but come on, an anti-war protest now? Isn’t that too little too late?
An example of a protest that I think worked wonderfully was when a group of concerned citizens stood outside a restaurant and handed out flyers informing the populous that the restaurateur was underpaying his delivery men and treating them as indentured servants. I didn’t know about this at the time, so the protest served to draw my attention to a problem that I can get on board with. To this day, I don’t go to this restaurant, despite the tasty food (I’m not identifying the restaurant as that would give out too much personal information). But had this same problem been ongoing for 5 years on a international level, the protest would have been mere ego-masturbation.
I’m sure the protesters mean well, but there is far better use of their time than standing on the streets (of what is a VERY liberal city) and telling people that already agree with them that the war is bad. How about sending your Congressman a letter about your feelings? How about donating your time to a VA hospital? How about volunteering for your favorite candidate? Any of these would serve the general public better.
The point is that next time you decide to take to the streets, think about what impact you’re actually going to have. If it’s nill, than stay home.
If you think you have a good reason to protest, leave me a comment and let me know.




















Comments