Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

Taco Bell Programming 394

Posted by timothy on Sunday October 24, 2010 @05:42PM from the how-dare-you-insult-the-code-monkeys dept.

theodp writes "Think outside the box? Nah, think outside the bun. Ted Dziuba argues there's a programming lesson to be learned from observing how Taco Bell manages to pull down $1.9 billion by mixing-and-matching roughly eight ingredients: 'The more I write code and design systems, the more I understand that many times, you can achieve the desired functionality simply with clever reconfigurations of the basic Unix tool set. After all, functionality is an asset, but code is a liability. This is the opposite of a trend of nonsense called DevOps, where system administrators start writing unit tests and other things to help the developers warm up to them — Taco Bell Programming is about developers knowing enough about Ops (and Unix in general) so that they don't overthink things, and arrive at simple, scalable solutions.'"

This discussion has been archived. No new comments can be posted.

Taco Bell Programming

Load All Comments

Search 394 Comments Log In/Create an Account

Comments Filter:

8 keywords? (Score:2, Funny)

by lalena ( 1221394 ) writes:

So if I limit myself to 8 keywords my code has less defects and is more maintainable?
- Re:8 keywords? (Score:5, Insightful)
  
  by Anonymous Coward writes: on Sunday October 24, 2010 @05:52PM (#34006696)
  
  exactly.
  Those 8 keywords are + - > [ ] . ,
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by the_macman ( 874383 ) writes:
    
    For EXTREME challenge limit yourself to 8 bits!
    - Re: (Score:3, Interesting)
      
      by Suzuran ( 163234 ) writes:
      
      I think programming on an old machine should be required for any sort of programming course. It would teach people to conserve resources and think about how the machine works.
      He who cannot program in 64K cannot program in more.
  - Re: (Score:2)
    
    by toastar ( 573882 ) writes:
    
    exactly.
    Those 8 keywords are + - > [ ] . ,
    Pfft... Real programmers only need NAND gates.
  - - Re:8 keywords? (Score:4, Funny)
      
      by Nursie ( 632944 ) writes: on Sunday October 24, 2010 @11:10PM (#34008456)
      
      You had 1s?
      Luxury! When I was a lad we had to program everything using only zeros!
      
      Parent Share
      twitter facebook
      - Re:8 keywords? (Score:5, Funny)
        
        by Anonymous Coward writes: on Monday October 25, 2010 @02:22AM (#34009222)
        
        Noli strepere.
        Per tempus mei, "zero" non habemus. Numerorum Romanorum usi eramus.
        
        Parent Share
        twitter facebook
- Re:8 keywords? (Score:5, Insightful)
  
  by hardburn ( 141468 ) writes: <[hardburn] [at] [wumpus-cave.net]> on Sunday October 24, 2010 @06:07PM (#34006788)
  
  Ook! Ook?
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by EdIII ( 1114411 ) writes:
  
  I don't get that either, but the summary said 8 ingredients. Which made me wonder if all of Taco Bell's food is made from 8 basic ingredients. That seems to be what it is saying... right?
  Either way, now I am confused and hungry.
  - 8? I thought it was 3 ... (Score:3, Funny)
    
    by Zero__Kelvin ( 151819 ) writes:
    I thought there were three basic ingredients:
    
    Protons
    Neutrons
    Electrons
    - Re: (Score:3, Insightful)
      
      by dakameleon ( 1126377 ) writes:
      
      I thought there were 6 - up, down, strange, charm, top and bottom?
      - Re: (Score:3, Funny)
        
        by digitig ( 1056110 ) writes:
        
        No, it's up, down, sideways. sex-appeal and peppermint.
  - Re:8 keywords? (Score:4, Informative)
    
    by iamnobody2 ( 859379 ) writes: on Sunday October 24, 2010 @11:23PM (#34008516)
    
    8 ingredients, no. i've worked at a taco bell, there's a few more then that. this is most Hot Line: beef, chicken, steak, beans, rice, potatoes, red sauce, nacho cheese sauce, green sauce (only used by request), cold line: lettuce, tomatos, cheddar cheese, 3 cheese blend, onions, fiesta salsa (pico de gallo, the same tomatos and onions mixed with a sauce), sour cream, gaucamole, baja sauce, creamy jalapeno sauce. plus 5 kinds/sizes of tortillas (3 sizes of regular, 2 sizes of die cut) nacho chips, etc etc here's an interesting fact, those Cinnamon Twists you may or may not love? they're made of deep fried rotini (a type of pasta, usually boiled)
    
    Parent Share
    twitter facebook
- Re: (Score:3, Informative)
  
  by drjzzz ( 150299 ) writes:
  
  So if I limit myself to 8 keywords my code has less defects and is more maintainable?
  ... fewer defects. Never mind.
- - Re: (Score:3, Insightful)
    
    by tompaulco ( 629533 ) writes:
    
    I find that most defects in the English that is produced is due to the use of words that are not in the vocabulary. A sufficiently intelligent compiler (listener) is able to successfully compile the code even though the programmer(speaker) did not write it correctly, which unfortunately only reinforces the bad habit of the programmer.
    I saw this same behavior in Internet explorer a few days ago. Someone complained that "Firefox isn't working", because an ASP page had a malformed link in it. IE was "smart" e
    - Re: (Score:3, Insightful)
      
      by Shadow of Eternity ( 795165 ) writes:
      
      "When you work in a monkeyhouse you're more used to having shit thrown at you".
My order (Score:4, Funny)

by Wingman 5 ( 551897 ) writes: on Sunday October 24, 2010 @05:49PM (#34006688)

Can I get a server logging system, hold the email notifications. Can I get extra rotating log files with that?

Share
twitter facebook
- Re: (Score:2)
  
  by phantomfive ( 622387 ) writes:
  
  Doesn't Unix already do that by default?
which language is best? (Score:5, Insightful)

by phantomfive ( 622387 ) writes: on Sunday October 24, 2010 @05:49PM (#34006690) Journal

Reminds me of a job interview I did once with an old guy, he had around 30 different programming languages on his resume. I asked him which programming language was his favorite, expecting it to be something like Lisp or Forth, but he said, "shell script." I was a bit surprised, but he said, "it lets me tie pieces in from everywhere and do it all with the least amount of code."

I wasn't entirely convinced, but he did have the resume. Seems Mr Dziuba is from the same school of thought. I read the full introduction to the DevOps page and I'm still not entirely sure what it's about. We should work together and deliver on time, or something like that.

Share
twitter facebook
- Re:which language is best? (Score:5, Insightful)
  
  by visualight ( 468005 ) writes: on Sunday October 24, 2010 @06:02PM (#34006750) Homepage
  
  The DevOps thing is yet another crock of shit on par with 'managing programmers is like herding cats' and web2.0
  
  Parent Share
  twitter facebook
  - Re:which language is best? (Score:5, Funny)
    
    by Anonymous Coward writes: on Sunday October 24, 2010 @06:12PM (#34006816)
    
    The DevOps thing is yet another crock of shit on par with 'managing programmers is like herding cats' and web2.0
    I volunteer at a cat rescue. Herding cats is much easier than dealing with programmers.
    
    Parent Share
    twitter facebook
    - Re: (Score:3, Funny)
      
      by Steauengeglase ( 512315 ) writes:
      
      Meh, as long as you have a hunk of meat and a fishing pole, both tasks are the same.
      - Re: (Score:3, Funny)
        
        by Schadrach ( 1042952 ) writes:
        
        The problem being when the "piece of meat" in one case sues for sexual harassment. =p
- Re:which language is best? (Score:5, Insightful)
  
  by martin-boundary ( 547041 ) writes: on Sunday October 24, 2010 @06:59PM (#34007068)
  
  Sadly, Mr Dziuba has the right idea but uses terrible examples in his blogpost.
  Wget for crawling tens of millions of web pages using a 10 line script? He doesn't understand crawling at scale.
  There's a lot more to it than just following links. For example, lots of servers will block you if you start ripping them in full, so you need to have a system in place to crawl sites over many days/weeks a few pages at a time. You also want to distribute the load over several IP addresses, and you need logic to handle things like auto generated/tar pits/temporarily down sites, etc. And of course you want to coordinate all that while simultaneously extracting the list of URLs that you'll hand over to the crawlers next.
  His other example is also bullshit. Tens of millions of webpages are not that much for a single PC, it hardly justifies using MapReduce, especially if you're only going to process pages independently with zero communication between processes.
  MapReduce is all about cutting the dataset into chunks, then alternating between 1) an (independent) processing phase on each chunk, and 2) a communication phase where the partial results are combined. And where this really pays off is when you have so much data that you need a distributed filesystem.
  
  Parent Share
  twitter facebook
  - Re: (Score:3, Informative)
    
    by Andrew Cady ( 115471 ) writes:
    
    Wget for crawling tens of millions of web pages using a 10 line script? He doesn't understand crawling at scale.
    
    Wget is made for crawling at scale.
    There's a lot more to it than just following links. For example, lots of servers will block you if you start ripping them in full, so you need to have a system in place to crawl sites over many days/weeks a few pages at a time.
    wget --random-wait
    You also want to distribute the load over several IP addresses
    The way I do this with wget is to use wget to genera
- Re:which language is best? (Score:5, Insightful)
  
  by ShakaUVM ( 157947 ) writes: on Sunday October 24, 2010 @07:02PM (#34007088) Homepage Journal
  
  >>I asked him which programming language was his favorite, expecting it to be something like Lisp or Forth, but he said, "shell script."
  Shell script is awesome for a large number of tasks. It can't do everything (otherwise we'd just teach scripting and be done with a CS degree in a quarter), but there's a lot of times when someone thinks they're going to have to write this long program involving a lot of text parsing and you just go, "Well, just Cut out everything except the field you want, pipe it through sort|uniq, and then run an xargs on the result." You get done in an hour (including writing, args checking, and debugging) what another person might spend a week doing in C (which is spectacularly unsuited for such tasks anyway).
  
  Parent Share
  twitter facebook
  - Re: (Score:3, Insightful)
    
    by gangien ( 151940 ) writes:
    
    (otherwise we'd just teach scripting and be done with a CS degree in a quarter)
    because the programming language has so much to do with CS?
    - Re:which language is best? (Score:5, Insightful)
      
      by itlurksbeneath ( 952654 ) writes: on Sunday October 24, 2010 @10:13PM (#34008186) Journal
      
      Bingo. CS has nothing to do with programming languages. It's about PROGRAMMING. Lots of CS grads still don't get this. They are typically the mediocre programmers that move on to project management (or something else that doesn't involve programming) fairly quickly. Or they end up doing horrible ASP web apps and Microsoft Access front ends.
      
      Parent Share
      twitter facebook
      - Re:which language is best? (Score:4, Interesting)
        
        by ShakaUVM ( 157947 ) writes: on Monday October 25, 2010 @07:48AM (#34010492) Homepage Journal
        
        Show me a new grad who is good at programming and I'll bet they didn't learn programming at university. A lot of new grads *think* they are good at programming. But apart from a handful here and there that cut their teeth on other projects, a new grad writing good code out of the gate is virtually unheard of. Hell, most people with 5 years working experience are crap.
        Most "real" CS people have been playing around with writing code since a young age. I'd written motion prediction code for a robot, an Axis and Allies simulator, a full AI suite, and a bunch of other stuff before I started college, but I think the university classes really polished my skills. Finite math taught me how to think about structuring loops so they always run correctly, my Theory class let me think about FSMs, CFGs, and Turing machines in a more logical manner, my programming languages and compilers classes really made me understand what was really happening when I hit cc (and also helped explain some of the bizarre compiler errors I'd seen over the years when my own compiler did the same thing), and most importantly, the UCSD CS TAs were absolute Nazis about proper coding technique. Not arbitrarily so, but if you've ever seen some code that made you want to punch someone, that's the sort of thing they knock 25% of your grade off for. Honestly, it really helped.
        You're right, though - Computer Science is a very weird mismash of different stuff all jumbled together.
        >>And even given the complete failure to actually learn anything that could be called science in their computer science degree 95% of the graduating class hasn't written more than 10K lines of code in their entire life.
        Mmm, just looking at my class assignments (that I saved) across 16 classes (quarters, not semesters), I wc at 20k lines of code. This doesn't count stuff that I wrote for fun, for work, or stuff that I deleted because it doesn't matter any more. The actual number should be several times that, that I wrote for school.
        IMO, if you're not writing software as a CS student, you're doing something wrong.
        
        Parent Share
        twitter facebook
      - Re: (Score:3, Insightful)
        
        by itlurksbeneath ( 952654 ) writes:
        
        Actually, I have a masters in CS. I was trying to make a lexical pun of sorts saying it's not about "programming languages" but about "programming", which, in my mind, is more about the problem solving and design than the actual implementation of a particular program. Once you learn how to program - how to solve problems and design a solution - implementing it in any particular language is just a matter of getting the syntax right.
        The programming language itself is a tool. Any particular problem can have
  - Re: (Score:3, Interesting)
    
    by flnca ( 1022891 ) writes:
    
    what another person might spend a week doing in C (which is spectacularly unsuited for such tasks anyway).
    A skilled C programmer also needs less than 1 hour for something like that. The standard C library has a lot of text processing functions (like sscanf()), plus it has a qsort(). Ever wonder why the C I/O library is suitable for managing database files? All the field functions in fscanf()/fprintf() etc. are suitable for database management.
    
    Also, C is still one of the prime choice languages for writing compilers, which do a lot of text processing.
    - Re:which language is best? (Score:5, Insightful)
      
      by ShakaUVM ( 157947 ) writes: on Monday October 25, 2010 @03:40AM (#34009516) Homepage Journal
      
      >>A skilled C programmer also needs less than 1 hour for something like that.
      Hmm, well if you want to time yourself, here's a common enough task that I automate with shell scripts. I just timed myself. Including logging in, doing a detour into man and a 'locate access_log' to find the file, it took a bit less than 4 minutes.
      tail -n 100 /var/log/apache2/access_log | cut -f1 -d" " | sort | uniq
      Grabs the end of the access_log and shows you the last few ip addresses that have connected to your site. I do something like this occasionally. Optionally pipe it into xargs host to do DNS lookups on them, if that's how you prefer to roll.
      I'm honestly curious how long it will take you to do it in C, with/without the DNS lookup. Post source if you don't mind.
      
      Parent Share
      twitter facebook
      - Re:which language is best? (Score:5, Funny)
        
        by meyekul ( 1204876 ) writes: on Monday October 25, 2010 @06:47AM (#34010228) Homepage
        
        tail -n 100 /var/log/apache2/access_log | cut -f1 -d" " | sort | uniq
        ...
        I'm honestly curious how long it will take you to do it in C, with/without the DNS lookup. Post source if you don't mind.
        Not long at all...
        
        system("tail -n 100 /var/log/apache2/access_log | cut -f1 -d' ' | sort | uniq");
        
        Parent Share
        twitter facebook
  - - Re: (Score:3, Insightful)
      
      by ShakaUVM ( 157947 ) writes:
      
      You prolly should pipe through uniq before sort. You'll get the same results, but sort will be passed less ata leading to faster execution and smaller memory footprint.
      IIRC, uniq only collapses adjacent lines that are identical. So hence sort|uniq.
      Maybe there's a flag or something that will change that behavior? It'd probably have to do something on the order of sort anyway, though.
- - - Re: (Score:3, Interesting)
      
      by MightyMartian ( 840721 ) writes:
      
      And for real age, it's something that's been known since Unix went into wide-scale usage in the 1970s. The original Bourne shell with the toolset of the time, while obviously limited in some respects, was pretty damned powerful. Pop in some of the newer updates like bash and you have a helluva an environment.
More crap from Ted Dziuba. (Score:3, Insightful)

by Anonymous Coward writes: on Sunday October 24, 2010 @05:52PM (#34006700)

Good grief, I think this is yet another useless article from the Ted Dziuba/Jeff Atwood/Joel Spolsky crowd. They spew out article after article after article with, in my opinion, bullshit "insights" that don't hold any water in the real world. Yet they've developed such a large online following, mainly of "web designers", "JavaScript programmers" and "NoSQL DBAs", that it tricks a lot of people in the industry into thinking what they say actually has some usefulness, when it usually doesn't.
Yeah, it's great when we can write a few shell or Perl scripts to perform simple tasks, but sometimes that's just not sufficient. Sometimes we do have to write our own code. While UNIX offers a very practical and powerful environment, we shouldn't waste our time trying to convolute its utilities to all sorts of problems, especially when it'll be quicker, easier and significantly more maintainable to roll some tools by hand.

Share
twitter facebook
- Let me tell you a story (Score:5, Interesting)
  
  by Giant Electronic Bra ( 1229876 ) writes: on Sunday October 24, 2010 @07:07PM (#34007124)
  
  Once, about 20 years ago, I worked for a company who's line of business generated a VERY large amount of data which for legal reasons had to be carefully reduced, archived, etc. There were various clusters of VMS machines which captured data from different processes to disk, from where it was processed and shipped around. There were also some of the 'new fangled' Unix machines that needed to integrate into this process. The main trick was always constantly managing disk space. Any single disk in the place would probably have 2-10x its capacity worth of data moving on and off it in an given day. It was thus VITAL to constantly monitor disk usage in pretty much real time.
  On VMS the sysops had developed a system to manage all this data which weighed in at 20-30k lines of code. This stuff generated reports, went through different drives and figured out what was going in where, compared it to data from earlier runs, created deltas, etc. It was a fairly slick system, but really all it did was iterate through directories, total up file sizes, and write stuff to a couple report files, and send an email if a disk was filling up too fast.
  So one day my boss asks me to write basically the same program for the Unix cluster. I had a reputation as the guy that could figure out weird stuff. Even had played a small amount with Unix systems before. So I whipped out the printed Man pages and started reading. Now I figured I'd have to write a whole bunch of code, after all I'm duplicating an application that has like 30k lines of code in it, not gigantic but substantial. Pretty soon though I learned that every command line app in Unix could feed into the other ones with a pipe or a temp file. Pretty soon I learned that those apps produced ALL the data that I wanted and produced it in pretty much the format that I needed. All that I really had to do was glue it together properly. Pretty soon I (thank God it starts with A) I found awk, and then sed. 3 days after that I had 2 awk scripts, a shell script that ran a few things through sed, a cron job, and a few other bits. It was maybe 100 lines of code, total. It did MORE than the old app. It was easy to maintain and customize. It saved a LOT of time and money.
  There's PLENTY to recommend the KISS principle in software design. Not every problem can be solved with a bit of shell coding of course, but it is always worth remembering that those tools are tried and true and can be deployed quickly and cheaply. Often they beat the pants off fancier approaches.
  One other thing to remember from that project. My boss was the one that wrote the 30k LoC monstrosity. The week after I showed her the new Unix version, I got downsized out the door. People HATE it when you show them up...
  
  Parent Share
  twitter facebook
  - Re: (Score:3, Insightful)
    
    by Jaime2 ( 824950 ) writes:
    
    BTW, awk is a programming language. Really, all you did was to write their process in a different language, not convert it from a custom program to some built in tools.
    
    As a side note, I have a hard time with the concept that it took the VMS guys 30000 lines of code to do what could be done with a handful of regular expressions. They were either really bad at it, or it had grown for years and nobody had the guts to purge the dead code.
    - Re:Let me tell you a story (Score:4, Informative)
      
      by Giant Electronic Bra ( 1229876 ) writes: on Sunday October 24, 2010 @08:55PM (#34007828)
      
      Sure, awk is a programming language. It is also a command line tool. A bit more flexible than most, but you can't really draw a line between something that is a programming language and something that is a 'built in tool'.
      I really have no idea WHY their code was so large. It was all written in FORTRAN and VMS is honestly a nightmarishly complicated operating environment. A lot of it is probably related to the fact that Unix has a much simpler and more orthogonal environment. Of course this is also WHY Unix killed VMS dead long ago. Simplicity is a virtue. This is why Windows still hasn't entrenched itself forever in the server room. It lacks the simple elegance of 'everything is a byte stream' and 'small flexible programs that simply process a stream'. Those are powerful concepts upon which can be built a lot of really complex stuff in a small amount of code.
      
      Parent Share
      twitter facebook
      - Re:Let me tell you a story (Score:4, Informative)
        
        by cratermoon ( 765155 ) writes: on Monday October 25, 2010 @01:33AM (#34009016) Homepage
        
        Well, Linux IS Unix, just without the trademark, but I didn't really come here to correct your misconception on that.
        What I wanted to highlight was the reality behind your statements "we have fifty times as many Windows servers as the other two combined" and "The building where I work has a ratio of about 1 production Windows server for every four employees. If you count non-production servers, we have more Windows servers than people."
        This is most certainly not because Windows is so much better or more popular than the other platforms at your place of work. Any experienced sysadmin who is not a Microsoft apologist will confirm that for any typical datacenter server function, it's necessary to have more instances of Windows to get the same capacity, reliability and uptime as few instances of other server operating systems. It's just the nature of the Microsoft stack that effective load-sharing and failover are a necessity in capacity planning. Anyone who argues that a single instance of Windows is equal to a single instance of AIX or Linux has simply never been part of real world datacenter administration.
        In short, your employer may have a lot more Windows servers than anything else, but that certainly doesn't mean Windows is better or more popular -- it just demonstrates how the TCO of Windows is terrible.
        
        Parent Share
        twitter facebook
  - Re: (Score:3, Insightful)
    
    by symbolic ( 11752 ) writes:
    
    This kind of story makes me laugh when I see/hear anecdotes that have management talking about metrics like LoC.
    - Re:Let me tell you a story (Score:4, Insightful)
      
      by Kjella ( 173770 ) writes: on Monday October 25, 2010 @01:19AM (#34008980) Homepage
      
      LOCs is roughly as meaningless as valuing a document by its word count. You could spend tons on research on something summed up in a few pages, or get an endless word diarrhea of mindless babble spewed out at 300 WPM. But people need to measure progress. Yes, I've seen how it gets when nobody measures progress and everyone pretends the last 10% of code will suddenly turn a turd into a gem, if so expect the people with survival skills to disappear some 80% into the project. Another disastrous variation is to leave it entirely up to the subjective opinion of the manager, which in any sizable company means your career depends on your favor with the PHB and his lying skills compared to the other PHBs.
      Saying it's bad is like shooting fish in a barrel. Coming up with a good system of objectively measuring code design and quality that works in a large organization is ridiculously hard. Particularly since everybody tries to wiggle out of the definitions and go with what you measure, if you made avoiding LoC a metric then the lines would compacted to the point of obfuscation with hideous cross calling to save lines. You want people to hit a sane level of structuring and code reuse, neither LoC bloat nor 4k compos.
      
      Parent Share
      twitter facebook
  - - Re:Let me tell you a story (Score:4, Interesting)
      
      by 19thNervousBreakdown ( 768619 ) writes: <davec-slashdot@@@lepertheory...net> on Sunday October 24, 2010 @07:53PM (#34007386) Homepage
      
      How, exactly, are they brittle? I've heard this term used a number of times, but never actually seen a prediction of brittleness be an accurate predictor of any amount of bugs, maintenance issues, or really any negative outcome. As far as I can tell, it's just a weasel word to be used when you don't like something for aesthetic reasons or understand it fully.
      So, prove me wrong. Explain exactly what's bad about using code that's been more heavily used and tested in production systems than just about anything else for more than 20 years.
      
      Parent Share
      twitter facebook
      - Re: (Score:3, Informative)
        
        by dkf ( 304284 ) writes:
        
        How, exactly, are they brittle?
        The principal brittleness of shell scripts is their assumption that filenames do not contain odd characters like spaces. Most other languages don't do auto-splitting of every argument and so won't break when some user insists on creating a directory called "Documents and Settings"...
        (You can write armored shell scripts that cope just fine with this - I've done that quite a bit over the years - but a lot of people don't.)
      - Re:Let me tell you a story (Score:4, Insightful)
        
        by Securityemo ( 1407943 ) writes: on Sunday October 24, 2010 @09:32PM (#34008016) Journal
        
        The likelihood of pipe I/O changing in the basic UNIX toolkit is near zero. Or is it just that you (and/or managers and other "people-person" types) need someone to sign the dotted line for you to feel certain that things are as they should?
        
        Parent Share
        twitter facebook
        
        Re: (Score:3, Insightful)
        
        by badboy_tw2002 ( 524611 ) writes:
        
        Why would you assume he's talking about pipe I/O. If you're talking portability and dependencies, then yes, I'd say something that uses a bunch of smaller tools might have more brittleness than something that is entirely contained in code controlled by the maintainer. Its really not that far a stretch to say that you upgrade a machine and a newer version of some utility changes some output your script depends on, and boom, your process now comes to a halt. That's what brittle means in this situation.
        
        Re:Let me tell you a story (Score:5, Insightful)
        
        by LingNoi ( 1066278 ) writes: on Monday October 25, 2010 @12:15AM (#34008734)
        
        and how is that any different from a python library being updated and changing your program. Completely pointless argument.
        
        Parent Share
        twitter facebook
        
        Re:Let me tell you a story (Score:4, Interesting)
        
        by dkf ( 304284 ) writes: <donal.k.fellows@manchester.ac.uk> on Monday October 25, 2010 @04:16AM (#34009650) Homepage
        
        something that uses a bunch of smaller tools might have more brittleness than something that is entirely contained in code controlled by the maintainer
        Not necessarily. The unix tools are very well specified by comparison with most libraries used in nearly any language you care to name (they're in the POSIX spec) so there's a substantial amount that you can rely on, and rely on long-term. They can be composed poorly, of course, but bad programmers can write bad programs in anything so it's (close to) a null argument.
        Brittleness in shell scripts typically refers to assumptions of particular filesystem layouts or that nobody will be silly enough to put odd characters in filenames (if only that were true!) but piped IO is very stable and well tested.
        
        Parent Share
        twitter facebook
        
        Re:Let me tell you a story (Score:5, Insightful)
        
        by Americano ( 920576 ) writes: on Sunday October 24, 2010 @09:36PM (#34008034)
        
        This seems to be an odd criticism.
        It's like calling perl/python/C subroutines "brittle" because if you change the arguments or return values of any of them, all hell can break loose.
        'Brittle' to me means that ridiculous assumptions don't hold true *often*, leading to breakage, like this gem I found the other day in a developer's installation script:
        envfile = `ls -1tr /tmp/*.tar | tail -1`
        cp ${envfile} /apps/prod
        tar -xvf /apps/prod/${envfile}
        In other words - the envfile is "the single most recent tar file in the /tmp directory," with no checks or verification to ensure that it was the right one before you blasted it on into production.
        That's what I'd call 'brittle' programming, anyway - likely to break, and break spectacularly because you haven't thought through your requirements clearly, or bothered to verify that inputs are reasonable and sane.
        
        Parent Share
        twitter facebook
        
        Re: (Score:3, Funny)
        
        by 19thNervousBreakdown ( 768619 ) writes:
        
        It's like you've purposefully made an entire post full of weasel words, and even sentences! "Metaphorically if you try to bend them at all they shatter rather spectacularly, they are brittle." Well done, sir.
        
        Re: (Score:3, Insightful)
        
        by Hognoxious ( 631665 ) writes:
        
        If any single one of the tools you're running on changes its input or output even slightly the whole thing can fall apart in a rather spectacular manner.
        One, that very rarely happens with the common unix utilities.
        Two, ever heard of change management, testing and QA? If you're the kind of idiot who flings patches and updates willy-nilly onto a production box then sorry, but you deserve to have the sky fall on your head.
        
        Re: (Score:3, Insightful)
        
        by LingNoi ( 1066278 ) writes:
        
        C++ libraries are brittle because as a whole, and as far as their individual parts go, they assume a certain input set and generate a certain output set. If these assumptions turn out to be incorrect they will fail, sometimes spectacularly, and often it will take a serious amount of time and effort to determine exactly where the problem is.
        Fixed that for you. In other words your argument can be applied to any programming anywhere.
Software is not food (Score:2)

by BSAtHome ( 455370 ) writes:

You can easily have a little more or less salt, sugar or flour in your food. However, software is not so forgiving. Change one character and you screw up badly. Lets face it, software is hard to write and it is even harder to write good software.
Although re-use is a good thing and scripting many common problems instead of coding in [insert low-level language] is also good. But this should be common sense for any /good/ programmer. Good tools make bad programmers look slightly less bad, but fuck up anyway. G
- Re:Software is not food (Score:5, Funny)
  
  by Anonymous Coward writes: on Sunday October 24, 2010 @05:57PM (#34006724)
  
  "Compilers are like boyfriends, you miss a period and they go crazy on you."
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by TeknoHog ( 164938 ) writes:
  
  Change one character and you screw up badly. Lets face it,
  (I'd like to respond with something clever, but I'm afraid I'll negate my entire argument a'la Muphry.)
- Re:Software is not food (Score:5, Insightful)
  
  by GiveBenADollar ( 1722738 ) writes: on Sunday October 24, 2010 @06:13PM (#34006822)
  
  You can easily have a little more or less salt, sugar or flour in your food. However, software is not so forgiving. Change one character and you screw up badly..
  Just try substituting a tsp with a tbsp of salt in your favorite recipe and then tell me food it forgiving.
  
  Parent Share
  twitter facebook
  - Re: (Score:3, Interesting)
    
    by sakdoctor ( 1087155 ) writes:
    
    Had a friend confuse bulbs of garlic with cloves of garlic. Niiice.
    - Re:Software is not food (Score:4, Funny)
      
      by T Murphy ( 1054674 ) writes: on Sunday October 24, 2010 @07:36PM (#34007274) Journal
      
      Had a friend confuse bulbs of garlic with cloves of garlic.
      My uncle made that mistake once. It resulted in everyone asking him for the recipe (true story).
      
      Parent Share
      twitter facebook
I write all my code in brainfuck (Score:2)

by MyDixieWrecked ( 548719 ) writes:

...you insensitive clod!
8 commands. period. no more, no less. Super maintainable, cross platform and...
bah, who am I kidding?
What a relief! (Score:2)

by macraig ( 621737 ) writes:

When I saw the title I thought it was a book review of a new O'Reilly release of that name.
I limit myself to 2 bits (Score:3, Funny)

by topham ( 32406 ) writes: on Sunday October 24, 2010 @06:06PM (#34006776) Homepage

I limit myself to two bits. A 0 and a 1.
Why would I need 8?

Share
twitter facebook
- Re: (Score:3, Funny)
  
  by The_mad_linguist ( 1019680 ) writes:
  
  I limit myself to two ones, very carefully timed.
Code reuse, junk food example? (Score:2, Interesting)

by syousef ( 465911 ) writes:

Seriously, what's going on with the articles here? "My code is like a Taco"? Is that flying because of CmdrTaco's username?
Nothing new here:
1) Code reuse. Woopdeedoo. The whole industry has invested heavily in many paradigms for reusing code: The reusable library, module reuse, object reuse etc.
2) Stringing Unix commands together is news? Did I just take a Deloriane back to 1955? (Well that's a slight exaggeration. Unix has only been around since the 70s)
Finally, who wants to compare their code reuse to a c
- Re:Code reuse, junk food example? (Score:5, Interesting)
  
  by Tablizer ( 95088 ) writes: on Sunday October 24, 2010 @06:13PM (#34006820) Journal
  
  I've found the best reuse comes from simple modules, not from complex ones that try to do everything. The one that tries to do everything will still be missing the one feature you need. It's easier to add the features you need to the simple one because it's, well, simpler. With the fancier one you have to work around all the features you don't need to add those that you do need, creating more reading time and more mistakes.
  
  Parent Share
  twitter facebook
  - Re:Code reuse, junk food example? (Score:5, Insightful)
    
    by syousef ( 465911 ) writes: on Sunday October 24, 2010 @06:33PM (#34006934) Journal
    
    I've found the best reuse comes from simple modules, not from complex ones that try to do everything. The one that tries to do everything will still be missing the one feature you need. It's easier to add the features you need to the simple one because it's, well, simpler. With the fancier one you have to work around all the features you don't need to add those that you do need, creating more reading time and more mistakes.
    Agreed. With most complex frameworks there is also the additional overhead of having to do things in a particular way. If you try to do it differently or need to add a feature that wasn't designed for in the original framework, you often find yourself fighting it rather than working with it. At that point you should ditch the framework, but often it's not your decision to make, and then cost of redoing things once the framework is removed makes it impractical.
    
    Parent Share
    twitter facebook
    - Re: (Score:3, Insightful)
      
      by aztracker1 ( 702135 ) writes:
      
      I think this is one of the reasons why jQuery has become so popular... it does "just enough" in a consistent (relatively) way, with a decent plug-in model... so stringing things together works pretty well, and there is usually a plugin for just about anything you'd want/need. Though it's maybe a bit heavier than hand crafted code, stringing jQuery and plugins is less debt, with more reuse. I do have a few things in my current js toolbox... namely some JS extensions, json2 (modified), date.js, jquery, jque
- Re: (Score:2)
  
  by seebs ( 15766 ) writes:
  
  About twenty years ago, I was dating someone who was working on what she called the "Taco Bell theory of fashion", which was that you have a smallish number of items of clothing which all go together.
  I think it's just that they're a particularly impressive example, familiar to a lot of people, of an extremely broad variety of foods made from a very small number of ingredients. ... And yes, stringing commands together is, empirically, news to many people, because I keep finding people who can't do it.
Once again, The Onion shows us the way (Score:4, Funny)

by sootman ( 158191 ) writes: on Sunday October 24, 2010 @06:08PM (#34006794) Homepage Journal

From over a decade ago: Taco Bell's Five Ingredients Combined In Totally New Way [theonion.com]
I think of that every time Taco Bell adds a "new" item to their menu.

Share
twitter facebook
From TFA (Score:4, Interesting)

by Jaime2 ( 824950 ) writes: on Sunday October 24, 2010 @06:10PM (#34006806)

From the article:
I made most of a SOAP server using static files and Apache's mod_rewrite. I could have done the whole thing Taco Bell style if I had only manned up and broken out sed, but I pussied out and wrote some Python.
It seems that only software he knows counts as "Taco Bell ingredients". I'd trust Axis (or any other SOAP library) much more than sed to parse a web service request. Heck, if you discount code that you don't directly maintain, SOAP requires very little code other than the functionality of the service itself. I had a boss like this once. He would let you do anything as long as you used tools he was familiar with, but if you brought in a tool that he didn't know, you had to jump through a thousand extra testing hoops. He stopped doing actual work and got into management in the early 90's, so he pretty much didn't know any modern tool. He once made me do a full regression test on a 50KLOC application to get approval to add an index to a Microsoft SQL Server table.

Share
twitter facebook
- Re: (Score:3, Interesting)
  
  by metamatic ( 202216 ) writes:
  
  Heck, if you discount code that you don't directly maintain, SOAP requires very little code other than the functionality of the service itself.
  However, any time you change the API--even to make a change that no client should notice--you have to regenerate the glue code from the WSDL and recompile all your client programs. Which is why these days, I build REST-based web services.
Simplicity (Score:5, Insightful)

by SimonInOz ( 579741 ) writes: on Sunday October 24, 2010 @06:11PM (#34006808)

The complexity people seem to delight in putting into things always amazes me. I was recently working at a major bank (they didn't like me eventually as I'm bad at authority structures). Anyway the area I was working on involved opening bank accounts from the web site. Complicated, right? The new account holder has to choose the type of account they want (of about 7), enter their details (name, address, etc), and press go. Data gets passed about, the mainframe makes the account, and we return the new account number.
Gosh.
So why, oh tell me why, did they use the following list of technologies (usually all on the same jsp page) [I may have missed some]
HTML
CSS
JSP (with pure java on the page)
Javascript (modifying the page)
JQuery
XML
XSLT
JDBC with Hibernate
JDBC without Hibernate
Custom Tag library
Spring (including AOP)
J2EE EJBs
JMS
Awesome. All this on each of the countless pages, each custom designed and built. Staggering. In fact, the site needed about 30 pages, many of them minor variations of each other. The whole thing could have been built using simple metadtata. It would have run faster, been easier to debug and test (the existing system was a nightmare), and easily changeable to suit the new business requirements that poured in.
So instead of using one efficient, smart programmer for a while, then limited support after that, they had a team of (cheap) very nervous programmers, furiously coding away, terrified lest they break something. And yes, there were layers and layers of software, each overriding the other as the new programmer didn't understand the original system, so added their own. Palimpsest, anyone?
And yet, despite my offers to rebuild the whole thing this way (including demos), management loved it. Staggering.
But I still like to keep things simple. And yes, my name is Simon. And yes, I do want a new job.

Share
twitter facebook
- Re:Simplicity (Score:5, Insightful)
  
  by MichaelSmith ( 789609 ) writes: on Sunday October 24, 2010 @06:14PM (#34006830) Homepage Journal
  
  Complexity creates bugs
  Bugs create employment
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by bmo ( 77928 ) writes:
    
    "A bucket of scrap a day keeps the overtime on its way"
    --
    BMO
- Re:Simplicity (Score:4, Interesting)
  
  by swamp boy ( 151038 ) writes: on Sunday October 24, 2010 @06:24PM (#34006886)
  
  Sounds like your coworkers are busily filling out their resumes with all the latest fad software tools. Like you, I despise such thinking, and it's why I pass on any job opportunity where 'web apps' and 'java' are used in the same description.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by SanityInAnarchy ( 655584 ) writes:
    
    Out of curiosity, what would you use to write a web app in?
    - Re: (Score:2)
      
      by swamp boy ( 151038 ) writes:
      
      Really, I don't see anything wrong with using Java to write web apps. The problem is when all the 30 different libraries, frameworks, extensions, etc. get thrown in. I steer clear of anything that even mentions Hibernate, Spring (esp. AOP), and any mix of more than about 4 different technologies.
    - - Re:Simplicity (Score:4, Interesting)
        
        by SimonInOz ( 579741 ) writes: on Sunday October 24, 2010 @08:20PM (#34007544)
        
        “Debugging is twice as hard as writing code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. - Brian Kernighan”
        
        Parent Share
        twitter facebook
- Re:Simplicity (Score:4, Insightful)
  
  by NorbrookC ( 674063 ) writes: on Sunday October 24, 2010 @06:53PM (#34007032) Journal
  
  It seems to me that the point is that programmers have a variant of "if all you have is a hammer, every problem is a nail" saying. In this case, they have a huge toolbox, so every time they need to drive a nail, it means that they must design and use a methodology that will, eventually, cause the nail to be pushed into place, instead of just reaching for the hammer and getting the job done.
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by SixDimensionalArray ( 604334 ) writes:
  
  Hello, devil's advocate here... I totally agree with the sentiment that keeping things simpler is preferable, and that there are problems created by programmers who either don't care, are trying to preserve their job security (or pad their resumes with buzzwords), don't know better, or don't take the time to think out the design/maintainability of what they are doing.
  On a recent project to provide real-time, asynchronously updating, data-driven, interactive graphs and gauges on a modern web application, I h
Reference ? (Score:2, Funny)

by dargaud ( 518470 ) writes:

So now Taco Bell is a reference for both cooking and programming ? I ate there exactly once and it tasted like sucking ass off a dead donkey. I pity the people who've been forced to eat there since a young age and now think this is 'food'. Yeah, flamebait, etc...
- Re: (Score:3, Insightful)
  
  by lostmongoose ( 1094523 ) writes:
  
  I'm more disturbed by the fact that you know what dead donkey ass tastes like...
- Re: (Score:2)
  
  by multipartmixed ( 163409 ) writes:
  
  If you had paid attention in shell class and Taco Bell -- you would know that the Taco Bell ingredients are great for quickly passing through your pipeline.
  Just - try to pipe it through tail instead of head.
  - Re: (Score:3, Funny)
    
    by msaavedra ( 29918 ) writes:
    
    Taco Bell ingredients are great for quickly passing through your pipeline
    
    That's why one of my friends calls the place Taco Bowel. It's much more descriptive than the commonly-heard Taco Hell.
- Re: (Score:2)
  
  by SixDimensionalArray ( 604334 ) writes:
  
  We are one step closer to idiocracy [imdb.com]! I for one welcome my new "AOL Time Warner Taco Bell US Government Long Distance" overlords.
  Mmmm.. foamy lattes.
  -6d
- Re: (Score:3, Funny)
  
  by couchslug ( 175151 ) writes:
  
  That post is worthless without pics!
Unexpected (Score:4, Interesting)

by DWMorse ( 1816016 ) writes: on Sunday October 24, 2010 @06:18PM (#34006858) Homepage

Unexpected comparison of trained coders / developers, many with certifications and degrees, to untrained sub-GED Taco Bell employee... well... frankly, knuckle-draggers.
Also, I don't care if your code is minimal and profitable, if it gives me a sore stomach as Taco Bell does, I'm opting for something more complex and just... better. Better for me, better for everyone.
I get the appeal of promoting minimalistic coding styles with food concepts, and it's a refreshing change from the raggedy car analogies... but come on. Taco Bell? Really??

Share
twitter facebook
- Re: (Score:2)
  
  by Marcika ( 1003625 ) writes:
  
  Unexpected comparison of trained coders / developers, many with certifications and degrees, to untrained sub-GED Taco Bell employee... well... frankly, knuckle-draggers.
  >
  Oh my, aren't we thin-skinned today?
  I think you missed the point. The equivalent of the "blank-slate" Taco Bell employee is the blank-slate computer that only executes instructions given to it. The persons who get compared to good developers are the Taco Bell recipe writers, who managed to deliver instructions that yield quick, cheap, consistent and idiot-proof solutions. Many coders with degrees can't say as much.
It's easy to overthink even in the simplest cases (Score:3, Insightful)

by Meriahven ( 1154311 ) writes: on Sunday October 24, 2010 @06:39PM (#34006966)

I once had a pair of command line tools that both printed lists of words (usernames, actually, one per row), and I wanted to find out how many unique ones there were. Obviously, the right hand side part of the pipeline was going to be something along the lines of " | sort -u | wc -l", but then I got utterly stuck by the left hand side. How can I combine the STDOUTs of two processes? Do I really need to resort to using temporary files? Is there really no tool to do the logical opposite of the "tee" command?
You are probably thinking: "Oh, you silly person, that's so trivial, you must be very incompetent", but in case you aren't, you might want to spend a minute trying to figure it out before reading on. I even asked a colleague for help before realizing that the reason I could not find a tool for the task was quite an obvious one: such a tool does not exist. Or actually it kinda does, but only in an implied sense: what I was hoping to achieve could be done by the humble semicolon and a pair of parens. I only had to put the two commands in parens to run them in a subshell, put a semicolon in between, so one will run after the other is finished, and I was done. I guess it was just that the logical leap from "This task is so simple, there must be a tool for this" to "just run the commands one after another" was too big for my feeble mind to accomplish.
So I guess the moral of the story is, even if you want to use just one simple tool, you may be overthinking it :-)

Share
twitter facebook
- Re:It's easy to overthink even in the simplest cas (Score:4, Informative)
  
  by _LORAX_ ( 4790 ) writes: on Sunday October 24, 2010 @08:42PM (#34007706) Homepage
  
  Psst,
  " | sort | uniq -c "
  Will sort and then count repetitive lines and output count, line. You can pipe the result back through sort -n if you want a frequency sort or sort -k 2 for item sorting.
  
  Parent Share
  twitter facebook
  - Re: (Score:3, Informative)
    
    by eap ( 91469 ) writes:
    
    Psst,
    " | sort | uniq -c "
    Will sort and then count repetitive lines and output count, line. You can pipe the result back through sort -n if you want a frequency sort or sort -k 2 for item sorting.
    The problem was not figuring out how to count the unique items. It's the part before the pipe that was difficult. The poster needed to combine the results of two different commands and then compute the unique items. The solution would have to be, logically, "command1 + command2 | sort | uniq -c".
    Unless you can find a way to pass the output from command1 through command2, you will lose command1's data. The solution he/she found was elegant: (command1):(command2) | someKindOfSort. My syntax is probably
What is Devops? (Score:4, Insightful)

by hawguy ( 1600213 ) writes: on Sunday October 24, 2010 @06:41PM (#34006968)

I read the linked Devops article and know even less about than before I read the article. It's full of management buzzwords and I'm sure a CIO would love it, but what does it mean?
How does Devops help?
The Devops movement is built around a group of people who believe that the application of a combination of appropriate technology and attitude can revolutionize the world of software development and delivery.
...
Beyond this multi-disciplinary approach, the Devops movement is attempting to encourage the development of communication skills, understanding of the domain in which the software is being written, and, crucially, a sensitivity and passion for the underlying business, and for ensuring it succeeds.
oh yeah, that clears it up. All it takes is a passion for the underlying business and it's sure to succeed!

Share
twitter facebook
The cause of bloatware. (Score:2)

by w0mprat ( 1317953 ) writes:

If your software project is comparable to crappy fast food then your doing something wrong. Code obesity is killing our kids! On a more serious note, if you're reusing code you may be bringing along a lot of unnecessary fat that you really didn't need to. If you really want a lean mean program you will not be bringing in feature-laden libraries, you'll have to rewrite some stuff yourself.

The very top chefs and cooks will use 5-8 ingredients at the most to make dishes, they understand the importance of si
- Re: (Score:2)
  
  by multipartmixed ( 163409 ) writes:
  
  > The very top chefs and cooks will use 5-8 ingredients at the most to make dishes
  Curry, rice, chicken, oil, salt.
  That's eight, and boring!
Robustness (Score:2)

by Timmmm ( 636430 ) writes:

Shell scripting is fine for stuff that *only you* are going to use. It's just not robust enough for use in anything important, that more than one person might actually use. For example, handling paths with spaces is pretty damn hard - loads of scripts can't handle them.
Jim Gaffigan's experience: (Score:4, Funny)

by gblackwo ( 1087063 ) writes: on Sunday October 24, 2010 @07:00PM (#34007080) Homepage

Mexican food's great, but it's essentially all the same ingredients, so there's a way you'd have to deal with all these stupid questions. "What is nachos?" "...Nachos? It's tortilla with cheese, meat, and vegetables." "Oh, well then what is a burrito?" "Tortilla with cheese, meat, and vegetables." "Well then what is a tostada?" "Tortilla with cheese, meat, and vegetables." "Well then what i-" "Look, it's all the same shit! Why don't you say a spanish word and I'll bring you something." - Jim Gaffigan

Share
twitter facebook
Weak error handling (Score:5, Informative)

by Animats ( 122034 ) writes: on Sunday October 24, 2010 @07:54PM (#34007398) Homepage
A big problem with shell programming is that the error information coming back is so limited. You get back a numeric status code, if you're lucky, or maybe a "broken pipe" signal. It's difficult to handle errors gracefully. This is a killer in production applications.
Here's an example. The original article talks about reading a million pages with "wget". I doubt the author of the article has actually done that. Our sitetruth.com system does in fact read a million web pages or so a month. Blindly getting them with "wget" won't work. All of the following situations come up routinely:
- There's a network error. A retry in an hour or so needs to be scheduled.
- There's an HTTP error. That has to be analyzed. Some errors mean "give up", and some mean "try again later".
- The site's HTML contains a redirect, which needs to be followed. "wget" won't notice a redirect at the HTML level.
- The site's "robots.txt" file says we shouldn't read the file from a bulk process. "wget" does not obey "robots.txt".
- The site is really, really slow. Some sites will take half an hour to feed out a page. Maybe they're overloaded. Maybe their denial of service detection software has tripped and is metering out bytes very slowly in defense. You don't want this to hold up the entire operation. Last week, for some reason, "orbitz.com" did that.
- The site doesn't return data at all. Some British university sites have a network implementation which, if asked for a HTTPS connection, does part of the SSL connection handshake and then just stops, leaving the TCP connection open but sending nothing. This requires a special timeout.
- The site doesn't like too many simultaneous connections from the same IP address. We limit our system to three simultaneous connections to a given site, so as not to overload it.
That's just reading the page text. More things can go wrong in parsing.
Even routine reading of some known data page requires some effort to get it right. We read PhishTank's entire XML list of phishing sites every three hours. Doing this reliably is non-trivial. PhishTank just overwrites their file when they update, rather than replacing it with a new one. (This is one of the design errors of UNIX, as Stallman once pointed out. Yes, there are workarounds they could do.) So we have to read the file twice, a minute apart, and wait until we get two identical copies. Then we have to check for 1) an empty file, 2) a file with proper XML structure but no data records, and 3) an improperly terminated XML file, all of which we've encountered. Then we pump the data into a MySQL database, prepared to roll back the changes if some error is detected.
The clowns who try to do stuff like this with shell scripts and cron jobs spend a big fraction of their time dealing manually with the failures. If you do it right, it just keeps working. One of my other sites, "downside.com", has been updating itself daily from SEC filings for over a decade now. About once a month, something goes wrong with the nightly update, and it's corrected automatically the next night.
Share
twitter facebook
- Re:Weak error handling (Score:5, Informative)
  
  by arth1 ( 260657 ) writes: on Sunday October 24, 2010 @10:21PM (#34008230) Homepage Journal
  
  The site's HTML contains a redirect, which needs to be followed. "wget" won't notice a redirect at the HTML level.
  Actually, it does. But in any case, this is why you parse the HTML after fetching it with wget -- how else can you get things like javascript generated URLs to work?
  The site's "robots.txt" file says we shouldn't read the file from a bulk process. "wget" does not obey "robots.txt".
  From the wget man page:
  Wget can follow links in HTML, XHTML, and CSS pages, to create local
  versions of remote web sites, fully recreating the directory structure
  of the original site. This is sometimes referred to as "recursive
  downloading." While doing that, Wget respects the Robot Exclusion
  Standard (/robots.txt).
  The site is really, really slow. Some sites will take half an hour to feed out a page.
  And you still haven't looked at the wget(1) man page, or you'd know about the --read-timeout parameter.
  Maybe they're overloaded. Maybe their denial of service detection software has tripped and is metering out bytes very slowly in defense. You don't want this to hold up the entire operation. Last week, for some reason, "orbitz.com" did that.
  Not holding up your operation is why you use multiple tools that can run concurrently. A wget of orbitz.com taking forever won't prevent the wget of soggy.com that you scheduled for half an hour later, and neither will stop the parser.
  Of course, if you design an all-eggs-in-one-basket solution that depends on sequential operations, you deserve what you get.
  The site doesn't return data at all. Some British university sites have a network implementation which, if asked for a HTTPS connection, does part of the SSL connection handshake and then just stops, leaving the TCP connection open but sending nothing.
  This requires a special timeout.
  Yes, the --connect-timeout.
  The site doesn't like too many simultaneous connections from the same IP address. We limit our system to three simultaneous connections to a given site, so as not to overload it.
  wget limits to a single connection with keep-alive per instance. (If you want more, spawn more wget -nc commands)
  Even routine reading of some known data page requires some effort to get it right. We read PhishTank's entire XML list of phishing sites every three hours. Doing this reliably is non-trivial. PhishTank just overwrites their file when they update, rather than replacing it with a new one.
  That's no problem as long as you pay attention to the HTTP timestamp.
  (This is one of the design errors of UNIX, as Stallman once pointed out. Yes, there are workarounds they could do.) So we have to read the file twice, a minute apart, and wait until we get two identical copies. Then we have to check for 1) an empty file, 2) a file with proper XML structure but no data records, and 3) an improperly terminated XML file, all of which we've encountered.
  Oh. My.
  I'd do a HEAD as the second request, and check the Last-Modified time stamp.
  If the Date in the fetch was later than this, and you got a 2xx return code, all is well, and there's no need to download two copies, blatantly disregarding the "X-Request-Limit-Interval: 259200 Seconds" as you do.
  It'd be much faster too. But what do I know...
  The clowns who try to do stuff like this with shell scripts and cron jobs spend a big fraction of their time dealing manually with the failures.
  The clowns who do stuff like this with the simplest tools that do the job (
  Read the rest of this comment...
  
  Parent Share
  twitter facebook
  - Re:Weak error handling (Score:5, Insightful)
    
    by jklovanc ( 1603149 ) writes: on Monday October 25, 2010 @04:11AM (#34009628)
    
    It is interesting that wget does not handle errors other than ignoring them and trying to continue. The original poster's first and second point are not addressed. Does that mean the operator has to manually monitor the crons and restart the ones that failed?
    The site is really, really slow. Some sites will take half an hour to feed out a page.
    And you still haven't looked at the wget(1) man page, or you'd know about the --read-timeout parameter.
    Maybe they're overloaded. Maybe their denial of service detection software has tripped and is metering out bytes very slowly in defense. You don't want this to hold up the entire operation. Last week, for some reason, "orbitz.com" did that.
    Not holding up your operation is why you use multiple tools that can run concurrently. A wget of orbitz.com taking forever won't prevent the wget of soggy.com that you scheduled for half an hour later, and neither will stop the parser.
    Of course, if you design an all-eggs-in-one-basket solution that depends on sequential operations, you deserve what you get.
    How do you schedule orbitz.com to go off and then soggy.com to go off later? What of you are handling hundreds of different web sites? Hundreds of crons? How do you retry later on sites that are very slow at the moment? How would you know that wget timed out due to slow download?
    The site doesn't return data at all. Some British university sites have a network implementation which, if asked for a HTTPS connection, does part of the SSL connection handshake and then just stops, leaving the TCP connection open but sending nothing.
    This requires a special timeout.
    Yes, the --connect-timeout.
    The connection has been made so it is not --connect-timeout it is --read-timeout. That is the problem, there is no different timeout when you are slowly getting data vs getting no data.
    The site doesn't like too many simultaneous connections from the same IP address. We limit our system to three simultaneous connections to a given site, so as not to overload it.
    wget limits to a single connection with keep-alive per instance. (If you want more, spawn more wget -nc commands)
    You missed the point; it is not more connections it is limiting connections. Say I am crawling five different sites using host spanning and they all link to the same site. Since there is no coordination between the wgets it is possible for all of the to connect to the same site at the same time. What if I have 100 crawlers at the same time?
    The original poster is right; using wget ignores errors (timesout) and does not report them so there is no way of programaticly figuring out what went wrong and react to it.
    Things wget does not do: avoid known non responsive pages, requeue requests that have timed out or log them so that are not tried again, coordinate multiple crawls so they do not hit the same server simultaneously, handle errors itself. There are probably more.
    This is a perfect example of the 80/20 rule. The "solution" may cover 80% of the problem but that final 20% will require so much babysitting as to make it unusable. Wget is not an enterprise level web crawler.
    
    Parent Share
    twitter facebook
- - Re: (Score:3, Informative)
    
    by Eskarel ( 565631 ) writes:
    
    That was HTML redirects (well likely more specifically javascript redirects), not HTTP redirects.
My code is Taco Bell food at its finest (Score:4, Funny)

by rwwyatt ( 963545 ) writes: on Sunday October 24, 2010 @09:09PM (#34007904)

It leaves results in your shorts.

Share
twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

8 keywords? (Score:2, Funny)

Re:8 keywords? (Score:5, Insightful)

Re: (Score:2)

Re: (Score:3, Interesting)

Re: (Score:2)

Re:8 keywords? (Score:4, Funny)

Re:8 keywords? (Score:5, Funny)

Re:8 keywords? (Score:5, Insightful)

Re: (Score:2)

8? I thought it was 3 ... (Score:3, Funny)

Re: (Score:3, Insightful)

Re: (Score:3, Funny)

Re:8 keywords? (Score:4, Informative)

Re: (Score:3, Informative)

Re: (Score:3, Insightful)

Re: (Score:3, Insightful)

My order (Score:4, Funny)

Re: (Score:2)

which language is best? (Score:5, Insightful)

Re:which language is best? (Score:5, Insightful)

Re:which language is best? (Score:5, Funny)

Re: (Score:3, Funny)

Re: (Score:3, Funny)

Re:which language is best? (Score:5, Insightful)

Re: (Score:3, Informative)

Re:which language is best? (Score:5, Insightful)

Re: (Score:3, Insightful)

Re:which language is best? (Score:5, Insightful)

Re:which language is best? (Score:4, Interesting)

Re: (Score:3, Insightful)

Re: (Score:3, Interesting)

Re:which language is best? (Score:5, Insightful)

Re:which language is best? (Score:5, Funny)

Re: (Score:3, Insightful)

Re: (Score:3, Interesting)

More crap from Ted Dziuba. (Score:3, Insightful)

Let me tell you a story (Score:5, Interesting)

Re: (Score:3, Insightful)

Re:Let me tell you a story (Score:4, Informative)

Re:Let me tell you a story (Score:4, Informative)

Re: (Score:3, Insightful)

Re:Let me tell you a story (Score:4, Insightful)

Re:Let me tell you a story (Score:4, Interesting)

Re: (Score:3, Informative)

Re:Let me tell you a story (Score:4, Insightful)

Re: (Score:3, Insightful)

Re:Let me tell you a story (Score:5, Insightful)

Re:Let me tell you a story (Score:4, Interesting)

Re:Let me tell you a story (Score:5, Insightful)

Re: (Score:3, Funny)

Re: (Score:3, Insightful)

Re: (Score:3, Insightful)

Software is not food (Score:2)

Re:Software is not food (Score:5, Funny)

Re: (Score:2)

Re:Software is not food (Score:5, Insightful)

Re: (Score:3, Interesting)

Re:Software is not food (Score:4, Funny)

I write all my code in brainfuck (Score:2)

What a relief! (Score:2)

I limit myself to 2 bits (Score:3, Funny)

Re: (Score:3, Funny)

Code reuse, junk food example? (Score:2, Interesting)

Re:Code reuse, junk food example? (Score:5, Interesting)

Re:Code reuse, junk food example? (Score:5, Insightful)

Re: (Score:3, Insightful)

Re: (Score:2)

Once again, The Onion shows us the way (Score:4, Funny)

From TFA (Score:4, Interesting)

Re: (Score:3, Interesting)

Simplicity (Score:5, Insightful)

Re:Simplicity (Score:5, Insightful)

Re: (Score:2)

Re:Simplicity (Score:4, Interesting)

Re: (Score:2)

Re: (Score:2)

Re:Simplicity (Score:4, Interesting)

Re:Simplicity (Score:4, Insightful)

Re: (Score:2)

Reference ? (Score:2, Funny)