Slashdot Log In
New Method To Detect and Prove GPL Violations
Posted by
kdawson
on Sat Aug 25, 2007 01:20 PM
from the marked-at-birth dept.
from the marked-at-birth dept.
qwerty writes "A paper to be presented at the upcoming academic conference Automated Software Engineering describes a new method to detect code theft and could be used to detect GPL violations in particular. While the co-called birthmarking method is demonstrated for Java, it is general enough to work for other languages as well. The API Benchmark observes the interaction between an application and (dynamic) libraries that are part of the runtime system. This captures the observable behavior of the program and cannot be easily foiled using code obfuscation techniques, as shown in the paper (PDF). Once such a birthmark is captured, it can be searched for in other programs. By capturing the birthmarks from popular open-source frameworks, GPL-violating applications could be identified."
Related Stories
[+]
Ask Slashdot: GPL Violations On Windows Go Unnoticed? 445 comments
Scott_F writes "I recently reviewed several commercial, closed-source slideshow authoring packages for Windows and came across an alarming trend. Several of the packages I installed included GPL and LGPL software without any mention of the GPL, much less source code. For example, DVD Photo Slideshow (www.dvd-photo-slideshow.com) included mkisofs, cdrdao, dvdauthor, spumux, id3lib, lame, mpeg2enc, and mplex (all of which are GPL or LGPL). The company tried to hide this by wrapping them all in DLLs. There are other violations in other packages as well. Based on my testing of other software, it seems that use of GPL software in commercial Windows applications is on the rise. My question is how much are GPL violations in the Windows world being pursued? Does the FSF or EFF follow up on these if the platform is not GPL? How aware is the community of this trend?" This new method of detecting GPL violations could help here.
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading ... Please wait.

new use of old trick (Score:5, Informative)
Re:new use of old trick (Score:5, Insightful)
Re:new use of old trick (Score:5, Interesting)
Amen to that. This is an old story, but I think it bears repeating. A friend of mine and I got "caught" turning in identical code for an assignment. I mean, identical. Same structures, variables, types, layout - everything. However, we wrote our programs separately and never saw each others' until our teacher asked about it.
It sounds improbable, but consider that:
We had a teacher who trusted us and we were both good students with good test grades, so it was dismissed as a humorous coincidence. I'm glad a human was willing to listen to our explanation and not just go along with the findings of an automated tester.
Re:new use of old trick (Score:4, Interesting)
Of course! ;-)
I think there was a bit of that, too: (pointing at me) "why did you do this?" "Because of this requirement in the last paragraph." (Pointing at friend) "and why didn't you use this approach?" "That wouldn 't have worked because of this part here."
Re: (Score:3, Interesting)
Re: (Score:2)
Re: (Score:3, Informative)
Re: (Score:3, Insightful)
No, really (Score:3, Informative)
Re: (Score:3, Interesting)
Re: (Score:2, Insightful)
Re: (Score:2, Insightful)
Re:No, really (Score:4, Insightful)
The Free Software movement, however, believes that code which protects the user's freedoms to use, modify and distribute it is intrinsically superior, and that people who wish to write code that does not respect these freedoms should not be aided by being able to use the work of those who do.
As such, an Open Source advocate would not mind, because the closed copy would quickly become inferior. A Free Software advocate would object, because their work would be being used for (in their view) unethical purposes (denying end users their freedoms).
Re: (Score:3, Interesting)
Re: (Score:3, Insightful)
The LEAST of my concern in releasing ANY open source is some childish popularity contest.
The only valid reason for me has
Re:No, really (Score:4, Insightful)
As you can imagine I really don't like the GPL or the FSF or Richard Stallman or any of his friends too much. While I recognize their contributions I think that they've fallen into the trap of trying to force everyone to convert to what has become a quasi-religion where the Inquisition is more important than celebrating mass.
Re: (Score:3, Funny)
I've never heard it called that before.
Re: (Score:2)
Re: (Score:2)
Re: (Score:3, Interesting)
Re:No, really (Score:5, Insightful)
You know, I'm absolutely tired of the BSD trolls that claim that the BSD license is "freer", not because I have a beef with the BSD, simply because your definition of "freedom" is ludicrous.
There are no absolute freedoms. Freedom to infringe on other's rights or freedoms gives more freedom to yourself, but limits it to other members of society. So long as there are things that cannot be owned or achieved communaly without side effects to others, freedoms have a limit, that is the actions that you cannot do so that others can do them.
The GPL definition of freedom is that a sofware and derivatives must always, under all conditions, be free. Yes, it a restriction to the developer who would wish to close up his source and use a GPLed piece of code, but it is an additional freedom to all the users who now have access to this source, which would have otherwise been denied.
Analogy time: the King is free to treat his peasants as dogs if he wished and if he has sufficient power to repress any opinions the peasants would have about that. The peasants, however, are limited by the freedoms the king has. Therefore the balance of freedoms for a more equal society would be that the king's freedoms be limited in order to allow the peasants to live their life.
So as you said, the GPL is also a social instrument, but it is no less free than the BSD; it simply distributes freedoms in a different matter. If you have a problem with that, use whichever license you wish to use. But don't go around accusing the GPL is limiting freedoms when it gives others freedoms that the BSD could never guarantee.
Re:No, really (Score:4, Interesting)
If by that you mean "you have a different definition of what freedom is, therefore I don't like you" then sure, I'm a "BSD troll" or whatever.
GPL -> Distribution restrictions.
BSD -> No restrictions.
No restrictions -> More freedom.
More freedom -> Possible unsavory side effects that people choose to live with
Isn't logic great?
BSD has a similar one, except that it doesn't place restrictions on how that happens. No one can make BSD-licensed software "non free", it will always be available to everyone. The only difference is that it might not benefit from coerced third party improvements, but that's what you sign up for.
The Kool-Aid is strong with this one.
BSD licenses guarantee absolutely nothing. Here's the code, do whatever the heck you want with it. The perceived benefits to using the GPL are nice, but please don't insult people's intelligence by claiming they result in more freedom. A restriction to ensure X or Y is still that - a restriction. The distribution restrictions on the GPL are designed to further Stallman's social causes (some of which I actually agree with). If you feel that's fine, then by all means use the GPL. That's your choice.
Re:No, really (Score:4, Insightful)
GPL -> Code will always be open and derivatives will stay that way
BSD -> Code can be closed off and new improvements to it can remain closed off forever.
Always open code -> More freedom
Sometimes open code -> Permanent loss of freedom with regards to that code.
Indeed, logic is great.
I never said that you can't sign up for that if so you wish, but code is always used within contexts, and when used in the context of proprietary software, any improvements on the code will be lost, any bug fixes will be lost, any added functionality will be lost.
Sure, some people will build upon it, but losing the obligation of putting the improvements back into the codebase means that it will eventually stagnate, and that the improvements that could have been used for the good of everyone who contributed can be denied at will. Look at FreeBSD with OS X: Apple got the foundation of their OS for free, and after that they simply closed up the rest at will. Perhaps the Apple folks got to improve their memory management, or add some new DRM techniques. Whatever they've done, the FreeBSD devs will never get to see it.
If they don't mind as users and developers to see their work used to create a proprietary, vendor-locked platform then it's their prerogative; as a used and dev I prefer to make sure that my code is an established base of constant improvement. With the GPL they're empowered and free to do that; with BSD new parties are empowered to do whatever and completely ignore original creators aside from the required attributions.
Notice that I'm not saying the BSD license is more free; it is equally free, but shifting freedom to new developers and vendors to be,IMO, lazy bastards and profiting for nothing, while GPL shifts it to original developers, contributors and users to get reciprocal treatment from others. You're free to think that the former is more important; I belive the latter brings greater benefits to everyone in the long term.
No one is coercing anyone here. If you had read and understoof the GPL, and it looks like you haven't, you'd know that the conditions apply only to those who want to redistribute software. If you want to keep your patches to yourself you can do that and it's your right, but if you're going to be using other's code to sell it or gain from it you have to abide by the creator's conditions. Going back to my point about freedom, perhaps as distributor you have less leeway regarding your changes, but your users have just gained the guarantee that they'll always be able to see and change the code. The BSD could not have done that.
You hit the nail on the head. Th
Re: (Score:3, Insightful)
FSF and Stallman have endorsed permissive licenses (Score:3, Informative)
Re: (Score:3, Insightful)
Re: (Score:3, Insightful)
Bes
A couple of things.... (Score:4, Interesting)
What is the false positive rate for this method? What if two programs just happen to do the same thing and the authors happened to choose similar ways to do it. Would this method conclude that one originated with the other? It's not a copyright violation because neither is a derivative work of the other.
Also, it occurs to me that this method would probably not be as useful as expected for detecting GPL violations. It would think it would only be effective for checking where you have source code available, or at the very least enough symbol table information to make comparisons, which you are not likely to have if somebody is violating the GPL because that implies no source code anyways (and almost certainly no symbol table information for the binary).
Re: (Score:2)
Re:A couple of things.... (Score:4, Insightful)
Also worth considering is what a compiler optimiser might do -- they can be quite good at rearranging code different ways depending on whether optimising for speed or code size, and what the target is. That's probably another reason why this might work better with java, which only has rather rudimentary jit optimiser.
If this tool can help identify some infringing code, that's well and good, but I wouldn't rely on it, wouldn't think it would add much if any legal weight, and neither would I think it could replace a thousand eyes.
Anyhow, the real problem, as I see it, with identifying open source code pilfered and added to a closed source project is that you generally aren't allowed to reverse engineer the code itself to see what it actually does. So even if you're Very Damn Sure that a piece of commercial software illegally uses open source and sells it as its own closed source, you're not allowed to investigate and come up with evidence. You'll have to file a suit and get a judge to order the code examined, and with only a good hunch to go on, and no way to document a financial loss, and probably not having too deep pockets yourself, that's rather unlikely to go anywhere.
Which is why I think it's important that we support institutions like FSF, which can occasionally fight the battle on behalf of the little guy.
Regards,
--
*Art
Re: (Score:2)
Re: (Score:3, Informative)
Re: (Score:2)
Clean room could replicate signature. (Score:4, Insightful)
This is not to say that the technique wouldn't be useful for hunting down GPL violations. But a positive is not difinitive by itself.
Meanwhile code obfuscation (even automatically generated obfuscation) could easily modify at least the timing, if not the order, of such calls.
Nevertheless this is a powerful tool: An hunk of GPL code that hasn't had its flow obfuscated systematically (even code that HAS been obfuscated but not systematically) will have large swaths of code that trips the detector. And it doesn't require reverse engineering until after the alarm goes off.
Good job, guys.
Re: (Score:2)
(Yes I know that the article says it can't. But that refers to the usual sort, which is directed at hiding the s
Coming soon... (Score:5, Funny)
Sweet Mother of All Revolutions (Score:5, Funny)
Torch?
Map of Corporate Castle locations?
FSF Lawyers programmed to be speed dialed in emergencies?
Desire to burn the non-believers?
Okay, I'm ready! What IRC Channel are we meeting in?
Other languages (Score:4, Interesting)
Java has a very large standard library that is always dynamically linked, and hence can easily be instrumented as the technique requires. C allows static linking which would make such hooking much more difficult. Additionally Java executes in a very standard environment due to the Virtual Machine, where as other languages may have varying ABIs type sizes and other properties that could add significant noise to the birthmark.
That said, system calls are always hookable and reasonably standard, so maybe this technique could be applied successfully there for malware detection or similar?
Heh.. (Score:2)
I see the community is still working as it always has.
Re: (Score:2)
It may be news to you but non-commercial licenses are AFAIK universally considered non-free (where as you see to imply the two are mutually exclusive). And when has anyone ever had any problem with people going to lengths (whatever that means) to prove li
Very Cool (Score:2, Insightful)
Just great (Score:2, Troll)
Next, the Open Source Business Software Alliance and raids by the Secret Service...
When is the last time we read anything about open source that wasn't about lic
Re: (Score:2)
I won't even bother addressing y
Wine? (Score:2)
It's not theft (Score:2)
I realise this is going off on a tangent, but I'm concerned about the use of the word theft. Usually I'm one of the first people to jump up and down when I hear the RIAA or MPAA accuse people of stealing, and I've notice
Read comments and think MS patent claims on linux. (Score:2)
The story is presented with a stage light focused on linux but then the house lights come up and show linux in jail along with most of the audience.
This is just one paper for one Automated Software Engineering (ASE) conference.
But if you
Re: (Score:2)
I'm really touched.
Re: (Score:3, Funny)
Re:And who the fuck might I ask is to spend the ti (Score:2)
Re: (Score:2)