Anonymous No More: Your Coding Style Can Give You Away 220

Posted by samzenpus on Wednesday January 28, 2015 @04:50PM from the leaving-your-mark dept.

itwbennett writes Researchers from Drexel University, the University of Maryland, the University of Goettingen, and Princeton have developed a "code stylometry" that uses natural language processing and machine learning to determine the authors of source code based on coding style. To test how well their code stylometry works, the researchers gathered publicly available data from Google's Code Jam, an annual programming competition that attracts a wide range of programmers, from students to professionals to hobbyists. Looking at data from 250 coders over multiple years, averaging 630 lines of code per author their code stylometry achieved 95% accuracy in identifying the author of anonymous code (PDF). Using a dataset with fewer programmers (30) but more lines of code per person (1,900), the identification accuracy rate reached 97%.

This discussion has been archived. No new comments can be posted.

Anonymous No More: Your Coding Style Can Give You Away

Load All Comments

Search 220 Comments Log In/Create an Account

Comments Filter:

Can they do it with corporate code? (Score:5, Interesting)

by msobkow ( 48369 ) writes: on Wednesday January 28, 2015 @04:53PM (#48927173) Homepage Journal

Can they do it with corporate code where there are naming and style standards in abundance, and code reviews to ensure those guidelines are followed?

Share
twitter facebook
- Re:Can they do it with corporate code? (Score:5, Funny)
  
  by Marginal Coward ( 3557951 ) writes: on Wednesday January 28, 2015 @04:57PM (#48927225)
  
  It seems like using the applicable features of the corporate version control system would be a lot easier - and possibly even better than 95% accurate.
  
  Parent Share
  twitter facebook
  - - Re: Can they do it with corporate code? (Score:2, Funny)
      
      by Anonymous Coward writes:
      
      Drats! I was.sure that.everyone else wrote.stuff.like "if(user == 'dumbfuck"){exit 666};
- Re: (Score:3)
  
  by Penguinisto ( 415985 ) writes:
  
  That's what "git blame" is for...
  /me ducks and runs like hell...
- Re: (Score:2)
  
  by TitusC3v5 ( 608284 ) writes:
  
  It's not just limited by corporate code. Good luck doing this on pep8 Python.
- Re: (Score:2)
  
  by dark.nebulae ( 3950923 ) writes:
  
  I've always found that even with style guidelines in place, developers will still leave their fingerprints all over it.
  Some devs will be verbose in their comments, some less. Some devs will embrace IoC where others shun it. Some devs will create a single method with all code in it, some will refactor the heck out of it with many methods. Heck, devs can't even agree sometimes on what should be public, protected, and private (and rarely will style guidelines dictate this kind of thing).
- Re:Can they do it with corporate code? (Score:5, Interesting)
  
  by jellomizer ( 103300 ) writes: on Wednesday January 28, 2015 @06:09PM (#48927841)
  
  Perhaps not as well. If people are following the coding standards for the organization then the code for the most part looks far more similar.
  When I am working with a development team, I will tend to adjust my unique style to better match what everyone else is doing. Even if it means doing coding methods that I will normally disagree with.
  If the code tends to use a bunch of Goto's instead of Procedures or classes. I will use those GOTO not for my benefit, but for people who will maintain my code later on, so they won't have to change their mindset and debugging strategies to see what the program is doing to do future corrections.
  I will go full Object Oriented if the group of people that I am working with do their coding full OO.
  My personal style would be more procedural, than OO. Not due to lack of knowledge or not realizing OO advantages and disadvantages. But if I am to code on my own, I code in the way that My Mind handles the requirements, and how I feel would be easier for me to change and fix my code in the future.
  I think this method is best for ID based on personal code, vs group corporate code, where a lot of your particular style is hidden.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by rtb61 ( 674572 ) writes:
    
    Just curious, how are larger companies going with algorithm libraries and variable naming rules to ensure maximum re usability of code (variables named by function rather than named by application). Any change, is most of it done from scratch, any fancy algorithm data bases with search functions based upon algorithm descriptors and software engineering. Also things like software language translators or the same algorithms stored in different languages. Any shift away from writing code to more assembling al
- Re: (Score:2)
  
  by AK Marc ( 707885 ) writes:
  
  Even if they build up a database of 100% of written code, how can they identify me if I only copy and paste code from others?
- Re: (Score:2)
  
  by wolrahnaes ( 632574 ) writes:
  
  Similarly I was thinking this would probably be defeated by a "minifier", obfuscator, or anything along those lines. There are dozens to choose from for most languages and it would be trivial for anyone attempting to remain anonymous to use them on their releases.
  If you want the code to remain usable, there are tools to enforce a standard style instead, in which case just set it up with rules based on a popular project if your language of choice doesn't have a specific style. At that point you're down to
  - Re: (Score:2)
    
    by Mr Z ( 6791 ) writes:
    
    Did you read the part in the article where they're actually doing the matching based on the ASTs (abstract syntax trees), and so are able to identify authors even after the code goes through an obfuscator? Relevant quotes:
    Their real innovation, though, was in developing what they call “abstract syntax trees” which are similar to parse tree for sentences, and are derived from language-specific syntax and keywords. These trees capture a syntactic feature set which, the authors wrote, “was c
- Re: (Score:2)
  
  by Gorobei ( 127755 ) writes:
  
  Can they do it with corporate code where there are naming and style standards in abundance, and code reviews to ensure those guidelines are followed?
  I was starting to wonder about that, then realized we at $BIGCORP are already generating ASTs from your input buffer, unifying those trees with a bunch of patterns, and telling your editor to flag questionable constructs. You type "if not foo in x" and 50ms later you get a proposed improved snippet. It's pretty rare to see quirky style in our codebase.
- - Re: (Score:2)
    
    by MouseTheLuckyDog ( 2752443 ) writes:
    
    They are talking about the corporate code as a baseline to compare to the anonymous code.
  - Re: (Score:2)
    
    by ShanghaiBill ( 739463 ) writes:
    
    If it doesn't, and you need this sort of analysis to determine who wrote a section of code, you're doing something wrong.
    With pair programming, you may have two programmers sharing a keyboard, and alternating writing chunks of code.
    I can usually look at a section of code, and reliably know which of my coworkers wrote it, even when they follow the style guidelines. Do they use an if-else chain, or a switch statement? Do they use #define's or prefer enums? Bitfields, or masks? Often I can tell who wrote it just by looking at the comments. Some people are neurotic about grammar and using complete sentences. Others prefer mi
    - Re: (Score:2)
      
      by Dashiva Dan ( 1786136 ) writes:
      
      I can tell who wrote it just by looking at the comments
      Yeah, my first thought on this was "how accurate would it be if you a) stripped out comments, and b) ran through a code formatter (many code editors auto-formatting to a standard on the fly)"
      I think including comments is basically cheating, as they're super distinguishable. You can tell what code I've worked on cause I consistently type "teh", spell words like "colour" with my local spelling, etc. But recognising just the actual code itself, that's more impressive.
  - - Re:Can they do it with corporate code? (Score:4, Insightful)
      
      by war4peace ( 1628283 ) writes: on Wednesday January 28, 2015 @07:10PM (#48928243)
      
      *raising hands slowly* Is there a problem, Coding Officer?
      
      Parent Share
      twitter facebook
    - Re: (Score:2)
      
      by rubycodez ( 864176 ) writes:
      
      "legal" of course meaning adhering to rules written and ratified by a group of power and money grubbing politicians in the pockets of large corporations.
    - Re: (Score:3)
      
      by bhcompy ( 1877290 ) writes:
      
      Why is it illegal?
- - Re:Can they do it with corporate code? (Score:5, Informative)
    
    by grimmjeeper ( 2301232 ) writes: on Wednesday January 28, 2015 @05:43PM (#48927645) Homepage
    
    You obviously haven't had to work in an environment where code has to be certified. I can tell you from first hand experience that coding in an RTCA DO-178B environment or similar has some pretty strict adherence to some very pedantic and strict coding requirements. You'll find this type of development in avionics systems (both civilian and military) as well as other industries like medical electronics where code safety is literally life-and-death.
    Outside of that type of environment, I do agree with you. You'd be lucky if even half of the developers have seen a company coding standard. You'd be hard pressed to find any developers who really adhere to it even when they know the document exists. But in those small niche markets, you'd be surprised at how strictly they adhere to arbitrary coding standards (whether they really impact code quality or safety or not).
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by s.petry ( 762400 ) writes:
      
      It's not just these type of environments that are strict. Well established companies have the same practices, because the only way to have controlled growth is to adhere to a set of standards. Sure, standards change over time but not quickly. For posterity, controlled does not imply restricted.
    - Re: (Score:3)
      
      by RabidReindeer ( 2625839 ) writes:
      
      A sonnet has strict rules, too.
      But I'd wager that someone could tell one of Shakespeare's from one of yours.
    - - Re: (Score:2)
        
        by grimmjeeper ( 2301232 ) writes:
        
        RC doesn't pay me at all. I haven't worked there for over 15 years now.
Up next, automatic intelligence rating... (Score:5, Funny)

by TWX ( 665546 ) writes: on Wednesday January 28, 2015 @04:58PM (#48927233)

...based on the quality of that code...

Share
twitter facebook
- Re:Up next, automatic intelligence rating... (Score:5, Funny)
  
  by halivar ( 535827 ) writes: <bfelger@gGINSBERGmail.com minus poet> on Wednesday January 28, 2015 @05:26PM (#48927515)
  
  goto blah;
  ^^ Idiot.
  // If you don't know why this is here, don't fuck with it. goto blah;
  ^^ Code guru.
  
  Parent Share
  twitter facebook
  - Re:Up next, automatic intelligence rating... (Score:5, Insightful)
    
    by lgw ( 121541 ) writes: on Wednesday January 28, 2015 @05:53PM (#48927721) Journal
    
    For lack of mod points let me just say: beautiful!
    It's like this in any engineering discipline:
    * The apprentice doesn't do things by the book, for he thinks himself clever
    * The journeyman does everything by the book, for he has learned the world of pain the book prevents
    * The master goes beyond the book, for he understand why every rule is there and no longer needs the rules
    Or put another way - the apprentice thinks he knows everything, the journeyman known how little he knows, the master knows everything in the field, and still knows how little he knows.
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by halivar ( 535827 ) writes:
      
      It's like jazz. You have to know know rules before you can break them.
      - Re: (Score:2)
        
        by halivar ( 535827 ) writes:
        
        And, I accidentally repeated repeated a word.
    - - Re: (Score:3)
        
        by russotto ( 537200 ) writes:
        
        The guru knows the novice knows more than the corporate enterprise architect, but won't let on lest the novice get a more-swelled head.
  - Re: (Score:2)
    
    by c ( 8461 ) writes:
    
    try { ... throw BlahException("blah"); } catch(Exception& blah) { ... } ^^ Idiot.
  - Re: (Score:2)
    
    by ihtoit ( 3393327 ) writes:
    
    if I were the programmer (I'm not, not since primary school when I programmed the TURTLE to draw stuff on large sheets of cartridge paper) I'd be dropping //remarks in everywhere. Back to when I did TURTLE programming, I got berated for wasting time on comments but when it came down to 1000+ lines of code, it was nice to know which draw routines drew what part of the image. My TURTLE St. Paul's Cathedral was 7,700+ lines of code, probably 3/4 of that was comments. If it were stripped of comments it'd probab
  - Re: (Score:2)
    
    by gstoddart ( 321705 ) writes:
    
    // exception was found // beyond here be dragons, run // make your escape now goto blah;
    ^^ code master
- Re: (Score:2)
  
  by ranton ( 36917 ) writes:
  
  This doesn't seem so far fetched. I'm not sure the field of natural language processing is that far away from being able to create metrics which would determine the skill of developer by looking at their code. It could then be used by employers during the hiring process and during reviews.
  While that may sound like a nightmare scenario (and it very well could be), a more intelligent software system may even be able to show why it thinks the code is bad, and give an interviewer or reviewer the chance to ask w
Let's analyze the cyberspying code. (Score:2)

by SeaFox ( 739806 ) writes:

Using this technique, can they tell us if the NSA did write the Regin Malware [slashdot.org] now?
- Re: (Score:2)
  
  by blackomegax ( 807080 ) writes:
  
  I want to see it run Regin against sections of code in gnu/linux/systemd and see if the same NSA shills wrote any of it.
What about Bitcoin? (Score:5, Funny)

by Anonymous Coward writes: on Wednesday January 28, 2015 @05:00PM (#48927255)

Can we use this to find Satoshi?

Share
twitter facebook
No Kidding (Score:5, Insightful)

by invid ( 163714 ) writes: on Wednesday January 28, 2015 @05:09PM (#48927367)

I can usually tell who wrote the code in the office by whether or not they put a space after their ifs: if(i == 0) vs if (i == 0); where they put their brackets, whether or not they replace their tabs with spaces, how they deal with bools: if (!var) vs if (var == false) and several other telling signs. There are so many combinations of variations no two programmers in the office (about 12 of us) have the same style.

Share
twitter facebook
- Re: (Score:2)
  
  by leonardluen ( 211265 ) writes:
  
  i could do the same. not only that but i could often also tell who had originally trained that person because often part of the trainers style often leaked into their style.
  i work at a university and we hire 100 level CS students. so we generally assumed they knew nothing and trained them from scratch.
- Re: (Score:2)
  
  by ThatsDrDangerToYou ( 3480047 ) writes:
  
  Yeah, about that... I start twitching whenever my boss types: MyFunction (arg1, arg2) and so on. Who puts a space after the function name before the '('? People who must die, of course.
  OK, calming down now.. 1.. 2.. 3.. 4.. 5..
  No, I'm OK, really.
  I had an old boss who was a code style nazi. He was an asshole. And actually, my current boss is very cool, even if he codes like that.
  - Re: (Score:2)
    
    by AK Marc ( 707885 ) writes:
    
    If the whitespace is meaningless, it should be eliminated (carriage returns excepted). However, I can understand people who add in meaningless whitespace, as some times a + b is easier to read than a+b, even if they are interpreted the same.
    - Re: (Score:2)
      
      by CannonballHead ( 842625 ) writes:
      
      So, you don't indent code? Or if you do, at what point is the indent meaningless (how many spaces/tabs) ... ? No spaces after semicolons? Or before/after braces? Or ...
      Readability should count as meaningful. It helps. And the compiler strips it out anyways, right, so ultimately it doesn't matter, just like comments, except in helping understand the code.
      I may be misunderstanding something completely in what you said... but I don't get why you would say it should be removed. Maybe in javascript for net
      - Re: (Score:2)
        
        by AK Marc ( 707885 ) writes:
        
        Indent isn't meaningless. But there's no reason to double-space an indent. It carries a reading meaning, related to nesting of code.
        
        Code "feels" smaller when it's compact. Also, having a single spacing method uniform across everyone makes for easier cut-and paste sharing. Having one person space things differently than another will result in decreased readability.
- Re: (Score:2)
  
  by Marginal Coward ( 3557951 ) writes:
  
  I once worked on a project that had a handful of developers, where each developer was in charge of one code for one of the software subsystems of the project. We didn't have much of a coding standard there - only about one page - but we ended up with a consensus coding style in the project that everybody could live with. Even so, you could always tell who wrote what by the personality shown around the edges of the coding style of a given module, function, or even over just a few lines.
- Re: (Score:2)
  
  by PRMan ( 959735 ) writes:
  
  And in Visual Studio, I hit Ctrl+K Ctrl+D all the time, which puts my code into "Standard" Microsoft format. If everyone did this, I imagine the analyzer would drop to 50% or lower.
  - Re: (Score:3)
    
    by ihtoit ( 3393327 ) writes:
    
    coding to book (sans comments) will kill the process of identifying authors stone dead, I think. If everybody's "Hello World!" was identical, how do you tell the difference?
- Re: (Score:2)
  
  by wasteoid ( 1897370 ) writes:
  
  if (false == var) prevents accidentally assigning false to var if you forget to use double equals
- - Re: (Score:2)
    
    by invid ( 163714 ) writes:
    
    Actually they have recently introduced style cop, which enforces some things, but it ignores a number of discernible quirks.
  - Re: (Score:3)
    
    by disambiguated ( 1147551 ) writes:
    
    Style guidelines should be about avoiding pitfalls of the language, using appropriate idioms, and not making life miserable for maintainers, not about where you put spaces and braces.
  - - - Re: (Score:2)
        
        by R3d M3rcury ( 871886 ) writes:
        
        Actually, the one I hate is:
        if ($variable == false) { doSomethingInteresting($variable); }
        and one of my co-workers does:
        if ($variable == false) { doSomethingInteresting($variable); }
        Of course, my code is beautiful and everyone else's is terse and ugly and everyone should write code the same way that I do. Try suggesting that to a group of programmers and see how far it gets you. Generally, it's not worth the argument--you w
        
        Re: (Score:2)
        
        by phantomfive ( 622387 ) writes:
        
        As the thread suggests, one advantage to different coding styles is that you can generally tell who wrote what and, if there seems to be a bug, you can track them down and tell them to fix it in that ugly mess. In our office, we have the rule that if you go around changing code style, you now own that code and are responsible for it. About the only issue we've run into is that people's styles evolve over time. So the guy right out of school may have a certain style that changes as he is exposed to more styles.
        git/cvs/svn/mercurial blame can tell you who wrote whatever code. Please tell me you are using some kind of source repository.......
    - - Re: (Score:2)
        
        by disambiguated ( 1147551 ) writes:
        
        Use a diff tool that can ignore formatting changes. I'm a fan of Beyond Compare [scootersoftware.com], but there are plenty of others.
That explains it (Score:2)

by Tablizer ( 95088 ) writes:

I suppose all those "// damn U bill gates!" comments gave me away
Welcome to the party (Score:3)

by meerling ( 1487879 ) writes: on Wednesday January 28, 2015 @05:16PM (#48927413)

When I was a kid in the 80s we figured out we could identify who wrote a particular piece of software by looking at it's code. Those individualistic and identifiable features we used in the argument over programming being an art or a science when we wanted to support the art side.

Share
twitter facebook
- Re:Welcome to the party (Score:5, Insightful)
  
  by Virtucon ( 127420 ) writes: on Wednesday January 28, 2015 @05:19PM (#48927449)
  
  It's all about style. Writing software is very creative and it needs to have the authors fingerprints on it somewhere. If corporations don't like that they can suck the source code into a parser and spit out perfectly mundane crap that loses the intonation and the thoughts the original developer had for it.
  
  Parent Share
  twitter facebook
John Varley Press Enter (Score:4, Informative)

by Crashmarik ( 635988 ) writes: on Wednesday January 28, 2015 @05:18PM (#48927435)

1985 Hugo Winner
Really, the fact that coding style is recognizable was so well known it made it into pop culture 30 years ago.
Also, on the smaller sample size the program might just be recognizing the parts of the style that come from the corporate standards. It would be interesting to see if it could recognize code from people who all work at the same company.

Share
twitter facebook
- Vernor Vinge probably beat him to it (Score:2)
  
  by Crashmarik ( 635988 ) writes:
  
  But I can't recall an instance.
  - Re: (Score:2)
    
    by AJWM ( 19027 ) writes:
    
    Vinge is considered one of the fathers of cyberpunk because of his "True Names", which did precede Varley's chilling (and Hugo-winning) "Press Enter[]" (1981 vs 1985).
    On the other hand, Varley's much earlier (1976) "Overdrawn at the Memory Bank" was also one of the seminal works of the field.
    Been a while since I've read it, but the warlocks (hackers) in "True Names" would never have let their identity (true name) be determined from their coding styles.
Source of Future Data (Score:2)

by Ronin Developer ( 67677 ) writes:

I guess we can expect that source code repositories will be scanned and processed. And, for code written by multiple authors, the modified code (from commits) will be scanned and indexed as well.
But, I bet they will never figure out who writes the malware recently attributed to the three letter agencies. They should, however, be able to figure out which agency writes the stuff if they get a copy of the source code or maybe even from decompiling the binary.
Additionally, if written from .NET, the CLR code c
- Re: (Score:2)
  
  by Shados ( 741919 ) writes:
  
  Back in the days of .NET 1~2, decompiling via Reflector or whatever other tool got you back pretty good stuff. Today, there's a LOT more sugar, from LINQ to async/await and everything in between. If you go back to the original language, good decompilers sometimes infer what the original sugar was from the output following certain conventions and patterns...but moving that to another language will give you unreadable garbage.
  Reading F# in C# , this>but,worse>
  - Re: (Score:2)
    
    by Shados ( 741919 ) writes:
    
    Bah, formatter messed things up. The last line was me joking about the crazy nested generic chains that F# types end up looking like in a language that doesn't support the same syntax sugar.
The key to this system being used is, ...... (Score:2)

by Selur ( 2745445 ) writes:

"The key to this system being used is, of course, first obtaining the code stylometries for a wide range of developers. The authors didn't address how, say, a database of programmers’ styles would be compiled. Also, to identify the author of a piece code would require access to the source code, and not just executables, though the authors mention there is some evidence that style is preserved in binaries."
-> so once you post to github and similar 'they' can link every code you ever write to you,...
Re: (Score:2)

by account_deleted ( 4530225 ) writes:

Comment removed based on user account deletion
- Re: (Score:2)
  
  by ihtoit ( 3393327 ) writes:
  
  are the podcasts/videocasts out for that yet?
Bad Coders Can't Be Identified (Score:4, Interesting)

by TrollstonButterbeans ( 2914995 ) writes: on Wednesday January 28, 2015 @05:25PM (#48927505)

If your coding is terrible and very newbie like, they can't single you out since your code is similar to the ocean of other terrible coders.

So if you are a paranoid freak, the best way to ensure your safety and keep the government off your back is to write terrible code.

Share
twitter facebook
- Re: (Score:2)
  
  by ThatsDrDangerToYou ( 3480047 ) writes:
  
  Ah, my work here is done!
Oblig XKCD (Score:3)

by Krazy Kanuck ( 1612777 ) writes: on Wednesday January 28, 2015 @05:30PM (#48927545)

Not that many of us actually use comments.... http://xkcd.com/1421/ [xkcd.com]

Share
twitter facebook
Most programming isn't new code (Score:4, Insightful)

by jgotts ( 2785 ) writes: <jgotts@noSpAm.gmail.com> on Wednesday January 28, 2015 @05:31PM (#48927555)

Most programming isn't writing new code. Most programming is working on someone else's crap you inherited. Invariably, you're going to be using that person's style or else the result will look like garbage.
There is also the problem that most non-trivial code is worked on by multiple people at the same time.
Writing some code from scratch as an assignment is a very artificial exercise nowadays, unless you're in a classroom setting. Therefore, you're going to get a signature from a programmer doing atypical work.

Share
twitter facebook
What complete and utter bullshit. (Score:3)

by MouseTheLuckyDog ( 2752443 ) writes: on Wednesday January 28, 2015 @05:32PM (#48927559)

95% of 250 coders. That means that out of a million programmers they will misidentify 200000.
I suspect that there are few enough variances in style to make any coders style unique. For example whether to uses braces on a one line statement after an in if in C.
With a few programmers it's likely to work, but when the possible source of programmers is the world...
Not to mention emacs, Visual Studio and such enforcing some indentation standards and programming languages enforcing others.

Share
twitter facebook
- Re: (Score:2)
  
  by Rinikusu ( 28164 ) writes:
  
  Okay, I just woke up from a nap, but could you show your math there? Maybe I'm missing something because I come up with.. 50k, not 200k...
- Re: (Score:2)
  
  by Ksevio ( 865461 ) writes:
  
  I find the statistics dubious as well - they also dropped the dataset to nearly 1/10 while roughly doubling the code input and the results were 2% better, so it's possible if we follow the trend it will reach the 20% you seem to quote.
- Re: (Score:2)
  
  by Kjella ( 173770 ) writes:
  
  What complete and utter bullshit.
  95% of 250 coders. That means that out of a million programmers they will misidentify 200000.
  You know it's not a contest to come up with the worst bullshit. If you're left with one person 95% of the time when you have 249 possible wrong answers, it's like being left with 4000 people when you have 999999 wrong answers. If all those are too close to tell apart you'll misidentify >99.9%.
  Imagine for example that you wanted to find people by height and weight, as measured to nearest cm and kilo. It might work decently on a small group, but if you scale it up to a million people there'll be a lot of d
- Re: (Score:2)
  
  by steelfood ( 895457 ) writes:
  
  It's 50,000.
  Or for the study, the 12 people who code exclusively in assembly.
So you could use this tool to make your code anon. (Score:4, Interesting)

by Maxo-Texas ( 864189 ) writes: on Wednesday January 28, 2015 @05:47PM (#48927675)

Write a version of pretty-printer that rerenders your code into a different style.
Have a lexicon of mipelled words for each "personality".
Another lexicon of variable names.
a vs inta vs int_a vs x.
Refactoring and unfactoring for subroutines.
Run the comments through google translate and back to english.
ukrainian
japanese
chinese
Synonym and antonym substitution in the comments.
The mind dances at the possibilities to mess with this algorithm.

Share
twitter facebook
- Re: (Score:2)
  
  by toonces33 ( 841696 ) writes:
  
  I can just imagine how unreadable such code would end up being, as any comments would look like they were written by some sort of AI tool.
- Re: (Score:2)
  
  by physicsphairy ( 720718 ) writes:
  
  "Hey, you notice some odd grammar, word choice, and spelling variance in this code?"
  "Oh yeah, must be Maxo-Texas. That's his anonymization software."
- Re: (Score:2)
  
  by steelfood ( 895457 ) writes:
  
  If you did this every time, you'd be identified as the guy who runs his code through Google Translate prior to release.
  Non-normal behavior is the most easy to single-out. In order to avoid detection, you basically have to become noise. And if you're the only one, then even that is a pattern.
  Sure, you could run some things through Google Translate and leave some things alone, but that'd be the equivalent of having two online personas.
Hah. I write everything in Fortran.. (Score:3)

by toonces33 ( 841696 ) writes: on Wednesday January 28, 2015 @05:58PM (#48927755)

and then use F2C to convert it to C code before I check in.. Try analyzing that!

Share
twitter facebook
- Re: (Score:2)
  
  by rubycodez ( 864176 ) writes:
  
  That's one way to make your ForTran run slower
Obfuscator? Or just translate A-B-A? (Score:2)

by RandCraw ( 1047302 ) writes:

Of course you could anonymize source code using an obfuscator.
But maybe the simpler way is to compile Java to bytecode, then decompile it back to Java. I suspect that's as effective as most obfuscators.
Code beautifier (Score:2)

by mrflash818 ( 226638 ) writes:

Perhaps something like Artistic Style might help.
http://astyle.sourceforge.net/ [sourceforge.net]
Easy Solution (Score:2)

by marciot ( 598356 ) writes:

Someone just needs to write a tool that takes source code and translates it into an obfuscated form that only the CPU can understand. Is anyone working on this type of privacy tool?
Pointless, but no doubt true (Score:3)

by Kittenman ( 971447 ) writes: on Wednesday January 28, 2015 @07:25PM (#48928349)

Wouldn't any programmer worth their salt identify themselves in the comments, or (if not) be logged as the last guy in that code on such-and-such a date, while working on such-and-such a patch number? (E,.g 'kittenman was here, 1/Jan/15, fixing Steve's crap').

But I hope my code is easily recognizable. I'm proud of it. It may not be the smartest, slickest, quickest there is, but it's mine. And it works.

Share
twitter facebook
- Re: (Score:2)
  
  by Shados ( 741919 ) writes:
  
  People still use these stupid 90s style comments with authors and dates and shit? Really?
  Just use the source control system for that.
will they show the method? (Score:2)

by ihtoit ( 3393327 ) writes:

I doubt it. Therefore, this is about as reliable as graphology (handwriting analysis).
If you take two programmers who code to book standard, how do you tell the difference between them using the same strict problem?
Here's a great idea... (Score:2)

by Lodragandraoidh ( 639696 ) writes:

You can have/use this idea for free:
Before a system will build said code, have the build system verify the code not only by the public key/code hash, but as a secondary method - the code fingerprint of the author in question.
This turns a creepy idea into something worthwhile.
- Re: (Score:3)
  
  by TWX ( 665546 ) writes:
  
  Heh. If it's effective in a clusterfuck of copy/paste, then it should be really effective when the bulk of the code is original...
  
  Sounds like the solution is to use an entirely different language than the bulk of one's work is in, if one wants to anonymously write malicious or otherwise legally complicated code.
  - Re: (Score:2)
    
    by Penguinisto ( 415985 ) writes:
    
    That kind of depends on the stylesheets, pre-compiler style enforcement routines, and the fact that a shit-ton of corporate code is often improved incrementally by multiple authors.
    'course, there's still the comments that you could use, but who does that?
- Re: (Score:2, Funny)
  
  by Anonymous Coward writes:
  
  Why would they even bother with an algorithm to process your ramblings? Every time I see you post, I instantly think "oh here's this jerk again".
  - Re: (Score:2)
    
    by Mordok-DestroyerOfWo ( 1000167 ) writes:
    
    I hate following your rambling, Anonymous Coward. Sometimes you get extremely schizophrenic and contradict yourself!
- Re:Demonstrates the need... (Score:5, Insightful)
  
  by Anonymous Coward writes: on Wednesday January 28, 2015 @05:09PM (#48927373)
  
  This is why people need to follow style guides, so that all source code is styled the same.
  There's a damn good chance 95% of coders are not criminals, nor would they care if someone identified their code.
  That said, this will become a legal nightmare is when this kind of profiling can be used to frame another coder.
  And with the laws wanting to treat any "hacker" as a potential terrorist these days, the consequences of even being accused can be rather severe to deal with.
  
  Parent Share
  twitter facebook
  - Re:Demonstrates the need... (Score:5, Insightful)
    
    by Impy the Impiuos Imp ( 442658 ) writes: on Wednesday January 28, 2015 @05:22PM (#48927481) Journal
    
    You want scary? The same can be applied to general text on the Internet, tying posters on different sotes together, including anonymous (not your real name avatar) to a site with your real name.
    Which the NSA probably has churning away on its databases. Which probably does little more than add confirmation of said links from watching and recording all traffic to any and all of a billion IP addresses.
    And I, for one, welcome our new panopticon overlords who won't abuse it, not one of their thousand agents, because they're supposed to check a got-a-warrant box on a piece of paper before choosing to abuse it.
    
    Parent Share
    twitter facebook
- Re: (Score:2)
  
  by grimmjeeper ( 2301232 ) writes:
  
  This is why people need to follow style guides, so that all source code is styled the same.
  Why does all code need to be styled the same?
  I can see a need in a safety critical environment like avionics or medical devices that needs strict adherence to rules to ensure that the code has been written correctly and with as few bugs as possible. But what difference does it make outside of that kind of environment? I mean, so what if there's a thousand different coding standards in the Chrome source? What difference does it really make?
  - harder to read if there is no consistency (Score:2)
    
    by Chirs ( 87576 ) writes:
    
    Generally speaking each project has a coding style that most code in the project adheres to, for the simple reason that it's easier to maintain when the code all looks more-or-less similar.
    If one area uses lowercase with underscores, and the other area uses CamelCase, and one area typedefs the heck out of everything while the other is explicit, then for someone coming in and trying to understand the code it makes it harder than necessary to figure out what's going on.
    So if you look at the linux kernel, or g
    - Re: (Score:2)
      
      by ChunderDownunder ( 709234 ) writes:
      
      Coding standard adoption can provoke holy wars but at the end of the day, you're a team. Though idiosyncratic decisions irk me, such as prefixing instance variables with underscore. Any decent editor will make such a distinction between scope via colours.
      Pretty printing tools and style checkers present in any decent editor will enforce coding standards with minimal fuss.
- Re: (Score:2)
  
  by EvilIdler ( 21087 ) writes:
  
  I wonder how this works for Go, where style is stricter, and people tend to use a formatting tool. Only the comments and naming schemes left to identify by, I guess.
- Re: (Score:2)
  
  by harperska ( 1376103 ) writes:
  
  Even when following a coding style guide 100%, there is still generally enough leeway to allow for plenty of personal style. There's the words you use to name things, use of whitespace and grouping of statements, basically everything about a piece of source code that's lost if you compile and then decompile a program. Just like the prose from two different authors are distinct from one other, even if they go through the same copy editor to fit a publisher's style guide. And if your corporate style guide req
- Re: (Score:3, Funny)
  
  by Tablizer ( 95088 ) writes:
  
  ... a patchwork of open-source freebies.
  So, what's it like to work for FaceBook?
- Re: (Score:2)
  
  by lgw ( 121541 ) writes:
  
  Newfags can't triforce
  Slashdot supports too few entities to do this right, and forget about UTF8. But you can get sorta close.
  *
  * *
  Unless someone can do better?
- Re: (Score:2)
  
  by __aaclcg7560 ( 824291 ) writes:
  
  I had a Java instructor who informed the class that he talked to two students in private because their code was nearly identical except for one small detail: one used the x variable, the other used the y variable. The program was so simple that he couldn't flagged the students for cheating.
  - Re: (Score:2)
    
    by ChunderDownunder ( 709234 ) writes:
    
    I once marked CS homework and uncovered cheating for an 'individual' assignment.
    A group of students had debug comments in their code - the giveaway? spelling mistakes.
- Re: (Score:2)
  
  by ihtoit ( 3393327 ) writes:
  
  there's a wiki site (can't remember the name) that takes great joy in posting accusations without attribution or evidence, and when called on them the Admins sit there and claim that the person who posted the slander is now the same person trying to get a retraction based on some sort of magic ring with a seekrit style decoder. Even when called out to post the evidence they claim to hold, they just dive straight in to claiming knowledge they can't possibly have for various reasons not least of which said cl

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Can they do it with corporate code? (Score:5, Interesting)

Re:Can they do it with corporate code? (Score:5, Funny)

Re: Can they do it with corporate code? (Score:2, Funny)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re:Can they do it with corporate code? (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:Can they do it with corporate code? (Score:4, Insightful)

Re: (Score:2)

Re: (Score:3)

Re:Can they do it with corporate code? (Score:5, Informative)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Up next, automatic intelligence rating... (Score:5, Funny)

Re:Up next, automatic intelligence rating... (Score:5, Funny)

Re:Up next, automatic intelligence rating... (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Let's analyze the cyberspying code. (Score:2)

Re: (Score:2)

What about Bitcoin? (Score:5, Funny)

No Kidding (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

That explains it (Score:2)

Welcome to the party (Score:3)

Re:Welcome to the party (Score:5, Insightful)

John Varley Press Enter (Score:4, Informative)

Vernor Vinge probably beat him to it (Score:2)

Re: (Score:2)

Source of Future Data (Score:2)

Re: (Score:2)

Re: (Score:2)

The key to this system being used is, ...... (Score:2)

Re: (Score:2)

Re: (Score:2)

Bad Coders Can't Be Identified (Score:4, Interesting)

Re: (Score:2)

Oblig XKCD (Score:3)

Most programming isn't new code (Score:4, Insightful)

What complete and utter bullshit. (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

So you could use this tool to make your code anon. (Score:4, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Hah. I write everything in Fortran.. (Score:3)

Re: (Score:2)

Obfuscator? Or just translate A-B-A? (Score:2)

Code beautifier (Score:2)

Easy Solution (Score:2)