Scraping of Facebook

Scraping is the process of gathering information from the internet.   The process involves using a software program to mimic a human surfing the web and collect the sought after information.  Some websites are specifically designed as a source to be scraped from, others are not.  Owners of sites that do not want information being scraped from their site use various techniques in an effort to slow or completely stop software programs from scraping information from their site.

The Terms of Use of some websites prohibit scraping.  Facebook’s Terms of Use prohibit scraping. Power Ventures created a program that integrates several social networking sites into one user interface. When Power Ventures started scraping Facebook and launched Power.com, Facebook brought claims that Power.com violated Facebook’s rights in the creation, testing, and launch of the application.  The complaint stated that Power.com created unauthorized cache copies of Facebook’s website or derivative works from the site.

The law is not settled on this issue.  With more and more networks wanting to regulate access by third party applications, this issue will be litigated in the future. In Facebook v. Power Ventures, Power Ventures motion to dismiss was denied.  The denial was based on MAI Systems Corp. v. Peak Computer, Inc., 911 F.2d 511 (9th Cir. 1993) and Ticketmaster LLC v. RMG Techs. Inc., 507 F.Supp.2d 1096 (C.D. Cal.2007).  The court reasoned that scraping a website fundamentally requires a copy of the webpage into a computer’s memory.  Even though the copy is only for a moment, it is enough to constitute a “copy” under Section 106 of the Copyright Act.  Additionally, since Facebook’s Terms of Use prohibit scraping this “copy” is made without permission.

Although the motion to dismiss was denied, arguments do exist for a differing opinion.  Section 101 of the copyright act requires a “copy” to be “fixed.”

“A work is “fixed” in a tangible medium of expression when its embodiment in a copy . . . is sufficiently permanent or stable to permit it to be perceived, reproduced, or otherwise communicated for a period of more than transitory duration.”

In Facebook, MAI, and Ticketmaster, I do not believe this requirement is met.  Scraping does not seem to fit within the definition of fixed.  Facebook does not have a copyright on the user’s content, and that is the information that the Power.com software extracts.  However, to get the users content, the software makes a temporary copy of the user’s Facebook profile page.  In denying Power Venture’s motion to dismiss the court is saying temporary copying of Facebook content to extract non Facebook content may violate Facebook’s copyrights.

The Digital Millennium Copyright Act requires: (1) ownership of a valid copyright, (2) circumvention of measures put in place to protect the copyrighted material, (3) unauthorized access by third parties, (4) infringement because of circumvention, (5) circumvention achieved through software that defendant designed or produced for circumvention, made available despite only limited commercial significance other than circumvention, or marketed for use in circumvention.

The users of Facebook and Power.com were controlling access to their own content on Facebook.  Power.com argued that due to this the unauthorized requirement was not met.  However, the Facebook Terms of Use ban the use of automated programs to access the Facebook website.

When it comes down to it, Power.com was a program that allowed a user to download information from their Facebook account and have it in a different user interface.  Not unlike someone using Outlook instead of the web mail provided by their internet service provider.  Facebook was threatened by Power.com; it allowed users to view their content without viewing Facebook’s advertisements.

Advertisements

~ by jonpufl on September 4, 2011.

9 Responses to “Scraping of Facebook”

  1. One issue I have with Power.com and the scraping issue is how often, once a user gives Power.com the initial authorization to access material from his or her Facebook account, does Power.com “scrape” Facebook again for the updated information on the user’s profile page. Is there a continuous link between the two providers that allows Power.com to scrape a subscriber’s Facebook page every time he or she updates it? This continuous updating, if that is in fact what happens, would seem to infringe upon Facebook’s copyright of its website. Further, I would have to disagree with you, Jon, that scraping does not fall within the definition of fixed. The definition you posted of “fixed” does not say that the material has to be reproduced or otherwise communicated; just that it is capable of doing so. It is hard for to me to imagine that a company such as Power.com is incapable, once the scraping has occurred, of reproducing that information – it clearly does reproduce the information scraped. I think that even a “temporary” copy of a Facebook page is clearly a violation of Facebook’s copyrights. If users wish to have their information disseminated through other mediums or websites, then they should have to input their information over again on that different interface. Though I understand that users are authorizing the scraping of their own information, once that information is on Facebook’s interface, I believe that Facebook has a right to protect who can come in and access that information.

  2. I have to agree with Lindsay on her disagreement of the definition provided for “fixed” information. I have to add that I agree with the way the court’s decision in Facebook v. Power Ventures. A work is “fixed” if it is “sufficiently permanent or stable to permit it to be perceived, reproduced, or otherwise communicated for a period of more than transitory duration.” There is no mention that the material must be communicated or used in an alternative fashion in order for complications to arise. However, while I am by no means a computer “wiz” or tech-savvy when it comes to computer systems and hardware, from what I understood from the Digital Millennium Copyright Act’s (DMCA) language on “copies” and the information on RAM, the making of temporary copies of a work implicates reproduction rights even when only temporarily stored. All of our world wide web adventures and computer program activities are locally and temporarily stored in our computer’s RAM. If you view “scraping” in this light, the mere “process of gathering information from the internet,” collecting information as we go seems more “fixed” than merely “copied”. Although information is stored only temporarily, the information in RAM remains “fixed” until the system is turned off or overwritten. A background on how computer memories work and the DMCA seem to indicate that when Power.com scrapes a Facebook user’s information, even just once, Power.com has implicated Facebook.com’s Terms of Use. The Power.com software must make a copy of a Facebook user’s entire profile page. It does not matter at this point that Power.com and its users only care to copy non-Facebook content (ie. profile information and not the page, scripting, etc.) At this moment, Facebook’s copyright has been infringed upon. This so-called “temporary” scraping of information falls within the definition of fixed as not only has Power.com embodied a copy of a user’s Facebook profile page (and its included information) but this copy is “sufficiently permanent” and can be “perceived, reproduced, or otherwise communicated” by the user now actively using Power.com. Who’s to stop Power.com from continuously reproducing and refreshing the user’s Facebook page with every new update and addition? In addition, the term “fixed” implies the information can be “otherwise communicated”. Isn’t that exactly the point of scraping the information in the first place?–Communicating the information in an alternative interface and users? It seems like Power.com directly violates Facebook.com’s Terms of Use with every “scrape” of a entire Facebook profile page, even if the user has authorized the scraping of his individual profile information.

  3. Regarding scraping on Facebook, my concern is not as much about anyone stealing my copyrighted content. My largest concern is that my “private” content will become public content. My reason for putting “private” in quotes is that in my opinion, nothing on Facebook is truly private anyway. There are companies that basically mine data that can be hired to find out things about you on the internet, that you may or may not have even known was there, and certainly that you thought was private.

    I don’t mean for this to be a PSA about behaving yourself and staying out of compromising positions, but it’s honestly the best defense. Any content on the internet, “private” or not, can/will be seen by someone who you don’t want to see it. The best way to not get caught is to not do anything that you can be caught doing.

    I personally see nothing wrong with having the ability to batch download my Facebook content, although I can understand Facebook’s disapproval of circumventing their ads. That being said, Adblock Plus (or any number of other ad-blockers) also prevents ads from being seen. Considering the ability to download the content under a DMCA analysis, the argument could absolutely be made for circumvention, but users’ own content should remain theirs. If I upload a picture of my family, it’s still MY picture despite Facebook hosting it. In return for hosting my picture and sharing it with my friends (and only my friends), I suppose it is fair that Facebook can generate ad revenue as people click through my pictures.

  4. I don’t think I agree with your last position that Facebook did this because they felt threatened. They obviously had this ahead of time in their TOS so it is not like they were specifically thinking of Power.com when they put that in their TOS. Whether it is Myspace, Power.com, or Google +, people always seem to be predicting the decline of Facebook and I don’t see that happening. I am not sure that the Outlook analogy is not perfect either because the only thing being copied over from the email server is the user email itself, whereas with Power.com they have to copy parts of Facebook’s content in order to get to the user’s content.

    I have a question about what information from a profile is exactly scraped by Power.com. Between pictures posted by my friends and wall posts, much of the information on my Facebook profile, I would not classify as my own user content. I understand that we probably waive our right to content when we put it on someone’s wall and once something is on the Internet, it can and will show up anywhere. But, I am not all that comfortable that a friend can intentionally and automatically put something I typed or one of my pictures up on another website without me having any knowledge of it. There may be people out there who would be upset that their use of Facebook is going to be used to the detriment of the Facebook. Facebook is free. They generate money through ads. I don’t see a problem with this. Trying to circumnavigate a way to enjoy the free benefits of Facebook while not being exposed to the ads seems like having your cake and eating it too. It just doesn’t seem all that right to me and Facebook should be allowed to stop that from happening.

  5. I’m not sure that I agree with the court that having information from a website held in your computer’s memory is the same as making a copy. Perhaps it would be different if this information was copied onto a hard drive, but memory only exists while there is power going to the computer. The moment you unplug your computer, the memory is gone. Anytime you “go to” a website (I put it in quotes, because your ISP actually goes out and brings you the website), information from that website is on your computer’s memory. Would John Q Public, surfing the web, realize they are “copying” all the information they see? The Digital Millennium Copyright Act is antiquated. Things have changed so much in the fifteen plus years since it became law. The way people voluntarily share information on websites like Facebook, the way they listen to music and watch movies, the way they live, work, and communicate are in many regards, radically different then the world of the late 20th century. Consider this: Bill Clinton sent two emails during his time as President. Just two. I digress. As you point out, the pivotal definition here is “fixed.” Information held in a computer’s memory is not fixed. It is momentary and fleeting.

  6. I do not have a problem with Facebook’s prohibition of scraping. While Facebook users post their own material on Facebook, I agree with the court that Facebook has a valid claim under the Digital Millennium Copyright Act. Regardless of how user friendly or beneficial Power.com’s interface is to Facebook users, Power.com has obviously violated Facebook’s terms of use. Users on Facebook have chosen to use Facebook for their social networking needs and there is nothing inherently wrong with Facebook putting reasonable controls on the use of its website.
    In regards to the requirement that the copy must be fixed, the statute does not specify the duration of time (other than to say it must be more than transitory); thus, it is in the court’s discretion to interpret the meaning of that phrase. I agree that the interface in making a temporary copy of the user’s profile is more than a transitory duration of time.

  7. I agree with jramsey, I do not see the reason for circumventing a free service. I understand circumventing for software (Cook case in which a Texas father and son pirated software) or music and DVD (Diallo case re counterfeiting ring in dvds and cds in Atlanta). Please note that I do not agree with what the defendants in these cases did, but I understand that they may have had a Robin Hood complex or just wanted to make money by undercutting the price. However, Facebook is free, the only reason for someone to circumvent the system is for their own edification or to make money through ads on their own sites.
    Further, I am also disturbed to know that my information can be posted without my knowledge to another site. This is similar to a number of sites that post your personal credit report information to a free site without your knowledge. These however, are privacy issues. The former infringes on the copyright owners rights to collect money for their work and marketing.

  8. The biggest issue I have with anything technology related is the antiquated laws that govern it. I strongly agree with joshroot that the Digital Millennium Copyright Act is outdated. We need a new body of legislation, drafted by the technology generation, to be written in a manner consistent with the use of technology today. The way media is shared today is radically different than it was even just a few years ago. What required a trip to the store to purchase a dvd can now be done by opening an application on your phone and pressing stream. I also agree that information held on your computer’s memory is not “copying” although scrapping does present other problems. When companies get away with things such as scrapping, you and I are the ones who have to deal with annoying security features such as CAPTCHA’s afterwards. It usually takes me three attempts before I can decipher the dang letters or words!

  9. First, let me begin by saying this: the court was correct in denying the motion to dismiss. As the Tenth Circuit said in Morgan v. City of Rawlins, 792 F.2d 975 (10th Cir. 1986), “[g]ranting defendant’s motion to dismiss is a harsh remedy which must be cautiously studied, not only to effectuate the spirit of the liberal rules of pleading but also to protect the interests of justice.” Thus, because Facebook’s complaint, when viewed in the light most favorable, adequately states grounds that, if true, could entitle them to relief, the motion had to be denied. Whether Facebook will ultimately win on the merits of the case is an entirely different story.

    There are two things that really struck me when reading the court’s order denying the motion to dismiss. First, regarding the direct copyright infringement, what really struck me was that the court relied on the assumption that in order to extract the Facebook user’s profile information, Power.com necessarily had to have copied the entire web page. Instead of delving into whether such a transient “copy” is “fixed” or not, I want to assume for a moment that it is, and instead focus on whether such copying falls under the doctrine of “fair use”. The “fair use doctrine” is a limitation and exception to the exclusive rights granted to a copyright holder. As stated in Atari Games Corp. v. Nintendo of America Inc., 975 F.2d 832 (Fed.Cir. 1992), “courts should adapt the fair use exception to accommodate new technological innovations.” Id. at 842. According to the Copyright act of 1976, use of another’s copyrighted work is protected if such use is considered “fair”. In making this determination, courts should apply the following, non-exhaustive list:

    1. The purpose and character of the use, including whether such use
    is of a commercial nature or is for nonprofit educational purposes;
    2. The nature of the copyrighted work;
    3. The amount and substantiality of the portion used in relation to the
    copyrighted work as a whole; and
    4. The effect of the use on the potential market for or value of the
    copyrighted work….

    Based on the limited facts given in the motion to dismiss, it seems to be that the purpose and character Power.com’s use is very “fair” because the website it not even intending to use the website’s copyright
    protected works in their final product. Any copyright violation is merely incidental to the websites true purpose – to gain access to a Facebook users’ non-copyrighted profile information. Thus, Power.com is in
    no way, shape or form trying to replicate, misappropriate or commercially exploit Facebook’s protected work. Any transient reproduction of the protected work is strictly to ascertain the unprotected
    information within the work. Thus, in my opinion, the purpose of such use is clearly fair. In the absence of Facebook alleging how such noncommercial use of their copyrighted material is actually harmful or
    adversely affects them in any way (aside from being faced with a newfound competitor), I don’t see how such use would be unfair.

    With regard to the second factor, such a nature, in this case, seems to be non-determinative to me because the defendant is not even trying to use the copyrighted work. It’s simply an incidental copying. Much
    like reverse engineering, the nature of the copyrighted work in this instance requires intermediate (and much more fleeting that in reverse engineering – which is considered fair use) copying in order to gain
    access to the ideas and information beneathe the copyrighted work.

    As for the third factor, I think it is evident that the amount and substantiality of the portion of the Facebook copyright-protected material is minimal in relation to the copyrighted work as a whole. Further, I don’t think the extent of the transient copying is more than necessary to further the purpose and character of the use, and thus such minimal use seems to be reasonable.

    Finally, with regard to the fourth factor, I don’t think an aggregator site (which seems to be prevalent in almost every industry nowadays, especially with instant messaging, travel websites, and news websites)
    would have a substantially adverse impact on the social networking market. Users would still need to be signed up with Facebook in order to use Power.com.. And although I am not a marketing whiz, I imagine
    that companies purchase ads on Facebook based on the number of users and based on the number of users whose profile information suggests they would be interested in the product, not on whether they will actually see the ad. But who knows, I could be wrong on this.

    For the sake of brevity, I will only briefly touch on the second issue that struck me. This issue was regard indirect copyright infringement and how the Terms of Use played a significant role in the court finding
    that such a claim could exist. In the order denying the motion to dismiss, the court stated that ” the utilization of Power.com by Facebook users exceeds their access rights pursuant to the Terms of Use.
    Moreover, when a Facebook user directs Power.com to access the Facebook website, an unauthorized copy of the user’s profile page is created. The creation of that unauthorized copy through the use of
    Defendants’ software may constitute copyright infringement.”

    What really struck me about this is that the court was not saying that Power.com was violating the Terms of Use, but that the Facebook users were violating the Terms of Use, and thus by piggy-backing on the
    user’s violation, Power.com was indirectly infringing on Facebook’s copyrights. Interestingly, non of the actual Facebook users that allowed such “unauthorized access” are not named as defendants in the suit
    when the court clearly indicated that they are in direct violation of copyright laws as well.

    The bottom-line lesson learned from this second issue is this: draft your terms of use carefully! If the Facebook users were not in direct violation of the Terms of Use (by violating the prohibition against “unauthorized access”), then it would be impossible for Power.com to be in indirect violation of the Terms of Use.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: