Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Programming

400,000 GitHub Repositories, 1 Billion Files, 14TB of Code: Spaces or Tabs? (medium.com) 391

Here's a debate that refuses to die: given a choice, would you rather use spaces or tabs? An episode of Silicon Valley last season had a bit on this. Now we have more data to analyze people's behavior. A Google developer has looked into 400,000 GitHub repositories -- 1 billion files, 14 terabytes to find that programmers with interest in specific languages do seem to prefer either tabs or spaces. Spoiler alert: space wins, like all the time.
This discussion has been archived. No new comments can be posted.

400,000 GitHub Repositories, 1 Billion Files, 14TB of Code: Spaces or Tabs?

Comments Filter:
  • Tabspaces? (Score:5, Insightful)

    by PCM2 ( 4486 ) on Thursday September 01, 2016 @10:33AM (#52808065) Homepage

    Yeah, OK, I get it. Spaces make it easier to cut and paste your code into whatever and have it look the same. But does anybody seriously type all those spaces? You don't just set the Tab key to expand to spaces?

    • Re:Tabspaces? (Score:4, Insightful)

      by Anonymous Coward on Thursday September 01, 2016 @10:36AM (#52808093)

      Any IDE can be configured to use spaces instead of when you indent.

      • by Z00L00K ( 682162 )

        And then you can use an auto-formatter in the editor that cleans up stuff for you and replaces any tabs for indentation with spaces.

        A tab is usually 8 spaces, but the editor may be configured otherwise. For small programs the 8 space tab is good enough, but as soon as stuff grows it has a tendency to cause the need for a very wide screen and refactoring help to break down things.

        So I imagine that if you look at the size of the program you will see that in small programs the tab is more common than spaces fo

    • by godrik ( 1287354 )

      But does anybody seriously type all those spaces? You don't just set the Tab key to expand to spaces?

      or most likely, the editor enforces formatting with whatever parameters you configure it with. That's emacs default setting in C for instance: pressing tab indents this line consistently with the one above.

    • I happen to use Sublime today, Nedit previously, and both replace tab characters with spaces. I'd be willing to bet that other applications do the same thing. Obviously this skews the result because the author may use tab but tabs get converted without knowledge. Additionally, many places strip tabs from files on commits. Again, this skews the result. TFA didn't bother to mention a pre commit hook as that would change the fakerovercy (sorry, it may be funny for a TV show but pretty stupid to argue outs

      • by Qzukk ( 229616 )

        The argument was never about what button you push on your keyboard, it has always been about how fucked up your shit gets when someone else opens it in a different editor.

        • I'm pretty sure, judging by the linked video, that it's about pressing a button 8 times, or pressing a button 1 time, and the reduced file size that results from using TAB.
    • I imagine that most people have their editor set to do smart spacing. The editor is smart enough to properly line up the next line based on the content of the previous line. So, it's not really onerous to use spaces and has the added benefit of being unambiguous.

      Amusing anecdote: Some very old Unix software is fanatical about using tabs. Not because of style considerations, because it's old enough that the extra disk space consumed by using spaces was unacceptable when the software was first written.

      • by Viol8 ( 599362 )

        " The editor is smart enough to properly line up the next line based on the content of the previous line."

        That is one of the most annoying things an editor can do IMO (and if someone has put it as a default in the global .vimrc they need to die a slow painful death). If my next line is in an outer block it means i have to delete the damn indentation which is a lot more labour intensive than putting some in in the first place!

        Also with tabs I can change the indentation width to suit my needs, with spaces I c

        • Re:Tabspaces? (Score:4, Insightful)

          by johnw ( 3725 ) on Thursday September 01, 2016 @11:05AM (#52808399)

          That is one of the most annoying things an editor can do IMO (and if someone has put it as a default in the global .vimrc they need to die a slow painful death). If my next line is in an outer block it means i have to delete the damn indentation which is a lot more labour intensive than putting some in in the first place!

          Your average editor which does auto-indentation like this generally has enough smarts to realise it needs to go back a level when you finish a block. You keep typing and your desired and configured indentation just happens.

          Even if it didn't (and why would you use an editor which couldn't manage it?) it would still be less work to reduce by one level of indent than to insert N-1 levels.

    • Re:Tabspaces? (Score:4, Insightful)

      by Austerity Empowers ( 669817 ) on Thursday September 01, 2016 @10:45AM (#52808217)

      But does anybody seriously type all those spaces? You don't just set the Tab key to expand to spaces?

      Not unless you are using notepad. Everything from vim to atom.io will let you choose hard tabs or spaces, almost all of them know to use hard tabs in makefiles. All of them can auto-indent too with either hard tabs or spaces.

      I worship the religion of spaces, but the religion of spaces still derives from the pantheon of indentation, we all use the tab key but will absolutely crucify anyone from the other religion.

    • What is this tab-expansion black magic that you're talking about?

      You know, this mystery thing that Silicon Valley apparently knows nothing about.

      Otherwise known as "the feature which makes their entire analysis meaningless".

    • I don't care how you get the spaces in there, my editor does "smart spacing" for me, just as long as the saved file has spaces, not tabs.

      One calorie Tab soda would be a good logo for tabs in files: 1 as in: only good for use in a single editor.

  • by QuietLagoon ( 813062 ) on Thursday September 01, 2016 @10:34AM (#52808077)
    ... except for the times I use spaces.
    • by AmiMoJo ( 196126 ) on Thursday September 01, 2016 @11:04AM (#52808389) Homepage Journal

      It's all these Java and HTML heathens skewing the results. If you look a C files the majority use the correct tabs for indentation and optionally spaces for alignment later.

      If god wanted us to use spaces for indentation he wouldn't have made keyboards with tab keys. Also, they reduce wear on your space bar and fingertips.

      TABS!

      • Tabs definitely.

        Oh except for the GNU indentation style which is eye-clawingly hideous and is a mockery of both spaces and tabs.

  • by cellocgw ( 617879 ) <cellocgw&gmail,com> on Thursday September 01, 2016 @10:36AM (#52808091) Journal

    You may "use" tabs, but plenty of editors are set to translate the tab into N spaces.

    Worse, plenty of editors import text documents and **change** the tabs to N spaces whether you wanted to or not.

    Usually this results in a totally garbled python script.

  • by ribuck ( 943217 ) on Thursday September 01, 2016 @10:38AM (#52808115)

    The article conveniently ignores Python, a 100% tabbed language.

    • by Anonymous Coward on Thursday September 01, 2016 @10:40AM (#52808141)

      Wrong: https://www.python.org/dev/peps/pep-0008/#tabs-or-spaces

    • Uh, it does include python, and PEP8 (which I, as a tabber, ignore) recommends using spaces, not tabs.

    • Definitely wrong. In our (medium sized) project all of our Python is space indented just like our C++ (consistency is key).

      Of course... we all use editors that do the correct spacing by pressing the TAB key... but make no mistake _spaces_ are used. We actually enforce this with automated testing that runs on every pull request... any TAB characters will immediately cause your PR to fail tests...

    • Python isn't a "tabbed language" it merely relies on levels of indentation. Consistent type/count of whitespace in front of code is used to differentiate namespaces. It could be 'n' spaces or 'n' tabs, just prepare for pain if you mix them within a file.
    • There is a 'py' line...
    • by fnj ( 64210 )

      The article conveniently ignores Python, a 100% tabbed language.

      Bzzzzzzt. Absolutely wrong. Google "PEP8". "Spaces are the preferred indentation method."

    • by cdrudge ( 68377 )

      It also conveniently ignores (older) COBOL compilers that required spaces. You and your modern languages that get to use tabs...

  • by LordLucless ( 582312 ) on Thursday September 01, 2016 @10:40AM (#52808145)

    Popularity is a poor measure of quality. Otherwise McDonald's would be Michelin starred.

  • by Viol8 ( 599362 ) on Thursday September 01, 2016 @10:42AM (#52808181) Homepage

    The whole point of a tabstop is you can change its width so a file can be indented to the preference of the person reading/editing it (set ts=[width] in vi). With spaces you're stuck with the (often poor) indentation the author chose who is essentially saying "Fuck you, you're going to read the code indented the way I want, not the way you want"

    • by johnw ( 3725 ) on Thursday September 01, 2016 @10:49AM (#52808245)

      It's a nice idea in theory, but it's never worked - nor is it the point of tabs.
      If all indents were always solely at the beginning of a line, and always an exact multiple of whatever N you've chosen then it might have a chance, but they aren't and so it just breaks.

      Don't mess with the size of a tab character - you'll just cause pain.

      • by serviscope_minor ( 664417 ) on Thursday September 01, 2016 @11:40AM (#52808651) Journal

        Ick no!

        Use only tabs to indent to the beginning of the indent level. Use spaces for all other alignment, including if you want to go a little further in than the indent level for some reason.

        Then, it all works perfectly and in 99.9999% of cases, someone can change the width of the tab character to whatever they like and it'll look right.

        Then silly people can use tab=8, weirdos can use tab=2 sick fucks can use tab=3 and the rest of us can view it as the gods decreed with tab=4. Feel free to permute that as per your preference except for the tab=3 clause.

      • by MobyDisk ( 75490 )

        While I agree, there are two remaining issues: Cases where it still doesn't work, and tool support.

        Tool support:
        Do you know of an editor that replaces spaces with tabs only at the beginning of the line? That would make this problem moot. But every editor I've ever seen either puts tabs everywhere it can, or never puts them in. None seem to use them intelligently.

        Special indent cases:
        tab-tabvoid LongFunctionNameWithManyArgumentsWithLongNames(int argumentNumber1,
        tab-tab-spaces-morespaces-std::hash_map>

    • Re: (Score:3, Informative)

      by cfalcon ( 779563 )

      > a file can be indented to the preference of the person reading/editing it

      No, fuck that. I shouldn't have to mess with my editor to get sane spacing. Code is text, not binary. Get the binary out of the code. What if my editor isn't able to do that, and what if I need to use a text mode or command line? Fuck all that hassle.

      And to flip it: if you really need to see it with some OTHER number of spaces than are actually there, I bet there's a text transform you could apply to the spaced file too. W

    • The issue is that a lot of code has traditionally been 80 characters wide. That way anyone who looks at it in a standard size terminal will see the exact thing that the author saw when he wrote the code. Tabs are ambiguous in that, in a traditional 80 character wide display, you may have to guess what tab size the author used and changing that tab size may have unintended consequences.

  • C is more tabby and C++ is more spacy. Different IDEs I presume.
  • by rockmuelle ( 575982 ) on Thursday September 01, 2016 @10:43AM (#52808197)

    In theory, tabs are the right solution. In practice, spaces are the right solution.

    -Chris

    • by TFlan91 ( 2615727 ) on Thursday September 01, 2016 @11:07AM (#52808425)

      I disagree.

      I disagree with the whole debate entirely...

      Tabs for indentation, spaces for alignment AFTER indentation... Tabs so that people can choose whatever width they want, but after that width (meant for indentation of blocks) use spaces to align whatever you want...

      It really isn't that hard and it pleases everyone.

      • by Kjella ( 173770 )

        In theory, tabs are the right solution. In practice, spaces are the right solution.

        I disagree. I disagree with the whole debate entirely... Tabs for indentation, spaces for alignment AFTER indentation... Tabs so that people can choose whatever width they want, but after that width (meant for indentation of blocks) use spaces to align whatever you want... It really isn't that hard and it pleases everyone.

        1. Congratulations, you win the "theoretically correctest" award
        2. I'd be happy if functions and variables had sane names
        3. Except those who have to write it, they're not pleased

  • A long, long time ago, in a data centre far, far away...

    Back in the day, a multi-user system might have had a single 4.8M hard disk, shared between the operating system and all its users. It made sense to use tabs instead of lots of spaces for indents, because each tab saved you 8 spaces - a pretty good compression ratio, and a worthwhile saving in disk usage.

    Then came a period of chaos, where people started muddling up their desired indent with the size of a tab. Decent editors always let you separate th

  • Only because I use Python.
  • I use Emacs with a function bound to the Return Key that performs a and the re-indent function indents with tabs and spaces as appropriate -- meaning 10 spaces would be indented using 1 tab and 2 spaces.

    (defun smart-newline ()
    "(reindent-then-newline-and-indent) if in a mode listed in smart-newline-modes. Otherwise just (newline)."
    (interactive)
    (if (memq major-mode smart-newline-modes)
    (reindent-then-newline-and-indent)
    (newline)))

    • Re:Both. (Score:5, Insightful)

      by WallyL ( 4154209 ) on Thursday September 01, 2016 @11:02AM (#52808369)

      Ew. That's an unusual combination of tabs and spaces.
       
      ...and also because of Emacs.

    • meaning 10 spaces would be indented using 1 tab and 2 spaces.

      That is truly the worst of both worlds. If you're going to use tabs, then tabs should be used for indenting (i.e. the block-level of the code) and spaces for formatting after the appropriate indent has been achieved, and the two should not mix.

      Doing it your way will improperly display in any other editor or viewer, including Emacs, that has a different-width tab -- which is pretty much all of them, since the 4-space tab-width is so popular.

      • Not so.

        As long as the indenting method is consistent, there's generally no problem regardless of the displayed tab-width -- which people should just fucking leave at 8 spaces and don't use so many of them. (People that indent using only tabs should simply be killed.) In addition, I've been doing this since 1985 and neither I or my co-workers have *ever* had any problem with my code. To the contrary, code that has been edited (usually) in vi, indented using just tabs, but with a smaller tab-width is much m

      • That is truly the worst of both worlds. If you're going to use tabs, then tabs should be used for indenting (i.e. the block-level of the code) and spaces for formatting after the appropriate indent has been achieved, and the two should not mix.

        Also, to follow up, Emacs has language-specific mode styles and knows how to indent code properly. It just uses a mix of tabs and spaces to achieve that indenting. All of this is configurable, but I always use the default style settings. Many of the posts for this article mention editors doing smart indenting rather than people just using tabs/spaces willy-nilly.

  • Even when the user requests tabs, when the number of visible spaces is less than, say, 8 spaces, some IDEs save that indent with spaces, then convert as tabs if necessary when loading the file. This because most users set the tabs at 3 or 4 spaces. This way the user has its tabs when working in the IDE, and may still 'cat' the file that visually looks the same. The method used to consider a file 'tab' or 'space' based on the number of indents in either category is therefore (probably) inaccurate.
  • by Maltheus ( 248271 ) on Thursday September 01, 2016 @11:00AM (#52808345)

    I think there'd be less of a debate if tabs were 4 chars in width, out of the box. Not sure they'd even bother with a tabstop setting had that been the case. Granted, you're always gonna have your freaks, like this one guy I worked for who insisted on 3 spaces for indentation, but they would have gotten nowhere had the default tabstop been reasonable from the start.

    That being said, spaces for work code, tabs for personal code.

  • i was really expecting the argument to go,

    "well, look at how much damn code is amassing through the collective efforts of all of us dumbasses.

    "well considering how enormous the collective code base is getting,

    "NOW how do you feel about using three times as much character data in your whitespace, HUH?!"

    but instead it was just an enormous poll.

    (i'm disappointed because it seemed like it would be an interesting debate.)

  • I write shell and will copy code from my scripts to paste & run. Leading spaces are no problem. Tabs will invoke bash completion. No tabs! No capes! Remember Thunderhead? Sucked into a vortex!
  • And by this, I mean spaces are inferior, just like Betamax actually was.

    Tabs are, of course, more logical because it lets the person viewing/editing the code decide how it looks. You can set your tab width to 2 chars for super tight code, 3 for a bit more obvious indentation, or a more standard look with 4. You can even go nuts and go with 8 spaces.

    A tab semantically means "indent this one level". Two tabs = 2 indentations, etc... What the _fuck_ does 8 spaces mean? Is that one indentation or two? Oh, tha

    • by Yunzil ( 181064 )

      As a Spacer, you Tabbers will be the first against the wall when the revolution comes.

    • by swilver ( 617741 )

      Except those annoying tabbers don't know how to tab properly. They use them everywhere, in comments, at the end of a line, etc. See how well that lines up when you change the tab size...

      Only tab for indentation at the start of a line. Never mix tabs/spaces for identation. Then it might work.

      Since most people don't have the time to care about this, teams decide to use spaces so it looks uniform everywhere. It's much easier to explain how to space properly than how to tab properly.

    • You also can't accidentally highlight half an indent when using tabs!

      I prefer 3-space tabs myself, but this is pretty unheard of in the space-indent world.

  • honestly i'd prefer if lines indented past 3 spaces were indented using some delimiter and integer.

    i hate backspacing over numerous tabs and the more indented your code gets the more you're working with a falling apart mountain of tabs. sure, code manager, blah blah, what ever. the only reason we're talking about this is because enough of us are manually maintaining our code through traditional key bindings.

    i prefer spaces only because i don't want to store tabs and deal with different editors treating tabs

  • Folks: You have no choice. In the Linux kernel, you have to use tabs and the tabs have to be set to eight spaces.
  • Expanded tabs are a waste of space. The reasonable solution is using tabs for leading indent, and space otherwise. Tabs provide proper semantic meaning for indent, allowing editors to respect user preference.

    Fundamentally though, the question shouldn't even need to be considered. Code is necessarily structured and should have a canonical format, enforced by a tool like gofmt [golang.org], which makes mechanical transformations of code much simpler. The amount of time wasted on formatting, both for writing and readin

  • I know it's not popular, so it would never make the list. But disappointed nonetheless that Whitespace [dur.ac.uk] didn't get scrutinised. That data would have meant something!
  • One billion files?! That's why adults use statistics. Relatively small random sample would have given the same result.
  • Tabs with PHP, HTML and other web programming languages. Not using them at least triples your filesize.
    I remember replacing space with tabs and reducing outbound web documents from 90 kb to 30 kb with one simple search and replace.
    Pretty much settling the argument at least in this field.

    If I catch you using spaces in your web output, I will hurt you. Bad. And you will deserve it.

    If you're using spaces in a compiled language - go right ahead. The compiler is a must and fixes it all.

    Just as bad or maybe even

  • by Richard Kirk ( 535523 ) on Thursday September 01, 2016 @11:53AM (#52808751)
    (snigger)
  • by zieroh ( 307208 ) on Thursday September 01, 2016 @12:00PM (#52808809)

    This was an analysis of files on github, yes? And by definition, those files are managed by git, yes?

    And thus, the files were created by all the young punks that rushed to git, because reasons, yes?

    In light of those conditions, yeah. I can see why there's a prevalence of spaces. The analysis only considers the work output of (mostly) young idiots.I'll bet there's a prevalence on github of non-plumb braces, too.

  • by Anonymous Coward on Thursday September 01, 2016 @12:47PM (#52809241)

    The regex used for determine tab vs. space is r'[ \t]', which matches the first space or tab in a line, not whether or not the first character in a line is a space or tab. Thus, for example, a using statement in C# (using System;) would count towards the spaces total where it shouldn't count towards either (unless preceded by spaces or tabs for some reason).

    Fix the regex (r'^[ \t]') and rerun it, i doubt it will allow tabs to win (tabs rule spaces drool), but it would give more accurate results for the charts.

  • by e r ( 2847683 ) on Thursday September 01, 2016 @12:58PM (#52809335)
    1. .json files shouldn't be counted because they're frequently generated by tools rather than by humans (or even human tools...)

    2. Making an argumentum ad populum [wikipedia.org] (i.e. an argument for or against spaces) is really just embarrassing yourself.

    3. Arguing that everyone should use spaces for indentation just because that's how the rest of the project is formatted is nothing more than an appeal to tradition [wikipedia.org]-- furthermore we have tools these days which can automatically transform the whole project to use tabs so it's not like it would amount to re-writing every single file by hand to make the change.

    4. About the flame war... Here's a blog post that sums it up nicely. [verou.me] Spaces for indentation are objectively inferior, provide no improvement over tabs, and are more difficult to work with. If your reason for using them is just to force everyone else to do things (i.e. read and write code) your way then screw you: you're literally one of the reasons the world sucks.

Keep up the good work! But please don't ask me to help.

Working...