Wal-Mart's Data Obsession 581
g8oz writes "The New York Times covers Wal-Mart's obsession with collecting sales data.
Fun fact: 'Wal-Mart has 460 terabytes of data stored on Teradata mainframes, at
its Bentonville headquarters.
To put that in perspective, the Internet has less than half as much data, according to experts.'
That much information results in some interesting data-mining. Did you know hurricanes increase strawberry Pop Tarts sales 7-fold?"
Yeah (Score:4, Funny)
Re:Yeah (Score:5, Funny)
normally, but I guess they didn't check when I was sharing my pr0n on direct connect.
Re:Yeah (Score:3, Insightful)
Walmart does drop your income (Score:5, Interesting)
Re:Walmart does drop your income (Score:4, Informative)
According to the article: "Not long after that, in January 2001, Vlasic filed for bankruptcy--although the gallon jar of pickles, everyone agrees, wasn't a critical factor"(Emphasis added). Nice Troll.
I would have thought that the Internet had more. (Score:4, Interesting)
So, if Walmart put up a web interface... (Score:4, Interesting)
I think the expert they got their information from was full of baloney.
Re:So, if Walmart put up a web interface... (Score:5, Insightful)
There is no way this can be true. Even if you ONLY take publicly availible WWW pages, it would far exceed their measly estimate.
Re:So, if Walmart put up a web interface... (Score:5, Informative)
1511565 MB, ~1.5 terabytes in PC games being shared.
There were 44977 Seeds and 196735 Downloaders, After all those torrents listed are downloaded there will be 241712 with all that data on their hard drives connected to the internet.
I calculated that total and got 338394133 Mb, ~338 terabytes.
Re:So, if Walmart put up a web interface... (Score:5, Funny)
We've got one bit set to 1...
I get 2. 2 bits of data on the internet. Hang on, I'll recount to be sure I didn't miss anything. Nope, just two bits...
Re:So, if Walmart put up a web interface... (Score:3, Insightful)
I'll see your terabyte and raise you a googolbyte (Score:3, Funny)
Re:I would have thought that the Internet had more (Score:3, Interesting)
Google has 8E9 web pages and documents indexed. If the average document is 20 kB in length, then we have 160 TB of publicly available data on the internet, not including pictures and filesharing. The latter probably has a great deal of duplicate data anyway.
Re:I would have thought that the Internet had more (Score:5, Informative)
Re:I would have thought that the Internet had more (Score:5, Informative)
Re:I would have thought that the Internet had more (Score:5, Insightful)
That means that the internet has well over a petabyte of information on it, much of the information is probably the same but it is on the internet>
Re:I would have thought that the Internet had more (Score:5, Informative)
People who call themselves "experts" but are really just talking out of their asses do. Consider that The Internet Archive [wikipedia.org] alone contains more than a petabyte (1024 terrabyte) of data, all of it accessible, and that they are adding on the order of 20 terrabyte a day, and you start realizing how much bigger the Web is.
Re:I would have thought that the Internet had more (Score:5, Informative)
Your number is wrong, from their faq:
The Internet Archive Wayback Machine contains approximately 1 petabyte of data and is currently growing at a rate of 20 terabytes per month.
That's 20 terabytes per month, not per day.
Re:I would have thought that the Internet had more (Score:5, Funny)
Even with that number, I wouldn't want to be the Hard Drive specialist...
Interviewer:Would care to describe you previous job?
-Installing HDs 24/7.
It does have more (Score:5, Informative)
The definition they used for "Internet" was probably "web pages indexed with a search engine" which is definately not the entire Internet.
But Nobody Should Really Need... (Score:5, Funny)
(did I just say that out loud?)...
Re:I would have thought that the Internet had more (Score:4, Interesting)
I, for one, (Score:5, Funny)
230 terabytes? Please (Score:5, Interesting)
Re:230 terabytes? Please (Score:5, Funny)
Sharing the wealth. (Score:3, Informative)
Re:230 terabytes? Please (Score:5, Funny)
Haha... (Score:5, Funny)
More than the Internet ?! (Score:4, Funny)
economies of scale (Score:5, Insightful)
Seems like they'd need to license map-reduce from google or something. (That's a distributed data correlation engine. With extremely high fault tolerence, to boot.)
Re:economies of scale (Score:5, Insightful)
Re:economies of scale (Score:5, Funny)
go on vacation for a week or ten..
deal with resulted data.
Re:economies of scale (Score:5, Informative)
Re:economies of scale (Score:5, Informative)
With SQL.
Teradata was built to handle processing very large datasets from day 1. 460 Terabytes distributed across a large number of CPUs and disks working in parallel with a robust SQL implementation isn't really the challenge. The hard part is keeping all those disks spinning when you start pushing MTBF limits, handling the thousands of concurrent users all banging away at the data, and the constant streaming of new data into the system in order to support near real-time DSS.
For those inclined to know more, check here. [ncr.com]
Re:economies of scale (Score:4, Funny)
So hire a monkey to sit in front of the rack. Condition him to hotswap a new hot spare when a red light & alarm goes off. If he replaces the drive before the old RAID hotspare gets rebuilt, he gets a treat; if not, a ZZZZZzzaaaappp! :)
--
Re:economies of scale (Score:4, Informative)
As the article says, they're using Teradata [ncr.com]. This is not a product that I'd expect the average Slashbot, who thinks "IT" and "internet" are synonymous, to have heard of. Nevertheless, if you work with industrial amounts of data, you will know that Teradata databases can reasonably claim to be to Oracle as Oracle is to MySQL.
Re:economies of scale (Score:4, Funny)
Except it takes 8 Teradata DBAs to manage the 460 TBytes, and 23 Oracle DBAs to manage 1 Gig ;^)
(Not a slam on Oracle DBAs, but on the ridiculous management burden of Oracle)
Re:economies of scale (Score:3, Funny)
Re:economies of scale (Score:5, Informative)
I know a guy who worked for Wal-Mart for ~8 years as some sort of data analyst and architect at the main offices in Bentonville. While he didn't go into too much detail, he told me that a lot of the back-end querying is done, surprisingly, with Perl-DBI on Oracle databases. When I asked why his team didn't use something like flat C, C++ or Java, portability was cited as a principal motivation and that, after a certain point, speed gains were only marginal. He also said when he left ~1.5 years ago, that a small cluster migration to DB2 was being talked about. I have no idea if they license search and query code, but I got the distinct impression that there was a team of software engineers who custom crafted search algorithms for the data.
You gotta love "experts" (Score:5, Funny)
What's the word I'm looking for? Oh yeah - it's bullshit
And in other news... (Score:5, Insightful)
I highly suspect Wal-Mart didn't get into the position it's in of being the largest retailer by being stupid, at least business-wise. This is the sort of project that allows them to stock a 120,000 square-foot big box store from JIT shipments every night, and why every Wal-Mart in a region looks the same. Though I would be interested to read more on the pop-tart to hurricane correlation...
Re:And in other news... (Score:5, Funny)
I think they mispelled "Phish concert".
the real interesting part is... (Score:5, Funny)
Correlation doesn't imply causation!!!!! (Score:5, Insightful)
Correlation doesn't imply causation!!!!!
I mean what if a third factor caused both the hurricanes and strawberry Pop Tart sales to increase 7-fold????
Somebody was going to blurt that bromide out at that statement, so it may as well be me.
Re:Correlation doesn't imply causation!!!!! (Score:5, Insightful)
Beer makes sense also. There are always a hell of a lot of hurrican parties in Florida whenever a hurrican comes 'round.
Re:Correlation doesn't imply causation!!!!! (Score:3, Insightful)
Not that high, consider other contributing factors (Score:4, Informative)
Consider also that people will not be worrying about their diets when they're primarily worried about not being killed by their own rooftops...
Combine a bunch of these factors together, and yes, I can easily believe 7x.
Re:Correlation doesn't imply causation!!!!! (Score:3, Funny)
Re:Correlation doesn't imply causation!!!!! (Score:4, Funny)
If pop tart sales go up, head for high ground?
Daniel
Seen it! (Score:5, Informative)
The gentleman who gave me the tour indicated they have something like 72 weeks (1 year plus 2 weeks) of purchase data on LIVE disk arrays, plus huge archives of the same data on tape. If you buy anything and use your credit, debit, or whatever card they can figure out your sales history obscenely quickly. Be afriad. Be very afraid.
I also got to see Walmart.com (Sun E15k) and Samsclub.com (A bunch of HP boxes in a smallish frame), they were creepy, in a sense... all those sales going on at once, converging on a spot not a few feet from me.
Re:Seen it! (Score:3, Informative)
Re:Seen it! (Score:5, Insightful)
According to Google:
1 year = 52.177457 weeks
So 72 weeks is 1 year plus 19.822543 weeks.
Re:Seen it! (Score:5, Funny)
1 year = 52.177457 weeks
So 72 weeks is 1 year plus 19.822543 weeks.
No, the grandparent poster was correct - 72 weeks is 1 year plus 2 weeks, if you're using Canadian years.
Re:Seen it! (Score:3, Funny)
Did he happen to mention anything about an attack on Zion?
Re:Seen it! (Score:3, Funny)
No seriously, why didn't you trash their data and free us all?
Be Afraid? Why? (Score:5, Insightful)
What could they do with their data, really, that would hurt anyone? It wouldn't be like "Bob Smith is buying condoms again." It would be more like "there's a condom spike in area code 78750 every Thursday, let's ship more out."
People who are afraid of data aggregation are jumping at shadows. Nobody cares what you in particular are buying. An individual as a data point is useless, unless you're an exemplar or something like that (which would be unusual).
Let's face it, individuals just aren't that interesting. More importandly from Wal-Mart's point of view, there's no return on looking at individuals.
Comment removed (Score:5, Funny)
Please remind me (Score:4, Funny)
I forgot, are we supposed to hate Wallmart?
On one hand they are a large corporate empire and on the other, they promote cheap linux computers.
arg, Im so confused
Re:Please remind me (Score:5, Interesting)
Or you can be a privacy-advocate Slashdotter and hate that they want RFID tags in everything.
Or you can be a Republican or Libertarian Slashdotter and admire that Wal-Mart opposes government interference in business (you do NOT tell Wal-Mart how to operate).
Or you can be an apolitical Slashdotter and just agree that, for some products, it's the cheapest place to go.
I'm the socialist Slashdotter. I know it's not much better but if I need something that I know is at a big retailer I make the trip to Zeller's [hbc.com] first. SILE (Solution Involving Least Evil)
Pop Tarts (Score:5, Funny)
Yes I did. God help me!
Heh, lets see if this "predicting" works (Score:5, Interesting)
Then one day, the managers were really excited, as we were going to have a computer order everything for us, from records of sales from before and it would "predict" what we would need. They said the extra stock on top of the aisles would be eliminated. We would be able to concentrate on customer service.
Well, the day came, and for a few months you could tell the computer was fighting with limited data. Some weeks would be rediculously overstocked on a few items, others, the leading sellers in the store would have empty shelves. When it finally settled down after a year, it was worse than before the computer.
The top of aisles were jammed to the ceiling with stock, there was never any room to put anything up there, and getting to the bottom for something you needed cost a lot of time. Plus, the backroom was packed with stock. You could hardly move around, and trying to find the last box of something buried underneath these huge piles was a task that killed your morale. During the slow months, one stocker for the whole store was enough for a night, now 3 were common to deal with all the stock.
Re:Heh, lets see if this "predicting" works (Score:5, Interesting)
The Walmart shipping system is was very efficent, but it was designed to serve walmart, not the individual stores. We had an extremely finite space in which to store things, and an extremely finite shoe department, yet the thing shipped us INCREDIBLE ammounts of shoes. And you'e been to a walmart right? They were *EXTREMELY* ugly, horrible shoes.
One night I recall the system sent me *5* palettes of shoes (1-2 is normal) which took a herculean effort to find *somewhere*, *anywhere* to store them.
And that was the job, every night. Somehow put away the incredible ammount of shoes that come. Every night, re-arrange "the stacks", re-arrange "the steel" to fit shoes that nobody wanted, that nobody could stop from coming.
One morning the manager walks up to me and says "Good news, they've decided to keep you full time!" to which I replied "Oh no dont you dare".
even the mango is tracted (Score:4, Insightful)
Just imagine (Score:3, Insightful)
There's a name for this.. (Score:5, Insightful)
Basically, the more data you have, the more likely you'll find weird coincidental correlations.
I guess these kinds of 'statistical finding' will become more and more prevalent in the future, given that we're living in an age where we're collecting ever-larger amounts of data, and have the resources to process all this data automatically.
It would be a good thing if people were a bit more sceptical of this kind of stuff. Correlation isn't causation.
Re:There's a name for this.. (Score:5, Insightful)
Ermm, RTFA.
You can be skeptical all you want. Someone at Walmart made the call, and they were right.
Big deal (Score:3, Funny)
Call me when they can Mathmatically prove which flavors are most popular in a Hurricane.
Re:There's a name for this.. (Score:5, Insightful)
And, firstly: that's not exactly a proper test.
(Supply does create demand. Why do you think stores like building big pyramids of merchandise, and so on.. Hint: It's not just because it looks pretty.)
Perhaps you should read my comment again and try to get the point. I wasn't neccesarily being sceptical about pop-tarts. I was being sceptical about the method in general.
Obviously some of the correlations they'll find are real too. That's not what I was referring to.
What I was referring to, was that it's very easy to become blind to the statistics. To fall into the trap of seeing correlations where there are none. The human brain has a remarkable pattern-finding ability. Unfortunately that ability does lead us astray sometimes.
(For instance reading human faces into natural formations, and so on)
Besides this, the Wal-mart people probably aren't very interested in talking about the times their fancy new method failed, are they?
Speaking of food trends, stop buying yeast! (Score:3, Funny)
EVERY TIME A LOAF OF BREAD IS BAKED,
APPROXIMATELY
150,000,000 YEASTS ARE
KILLED
Come to the award-winning 1987 film,
"The Very Small and Quiet Screams"
-- a cinematic electromicrograph of yeasts being baked.
A must for those who care about yeast, and especially for those who don't.
SPONSORED BY
Brown Anaerobe Rights Coalition (BARC)
Student Bakers for Social Responsibility
Coalition for the Elevation of Life (CELL)
Defend all life: "From greatest to least, from human to yeast!"
The Problem? (Score:3, Informative)
That doesn't mean they know what to do with it... (Score:3, Interesting)
This makes me wonder... there must be some ideal point where a certain amount of data collected is worth the most money because you can act on that data. After that point, collecting additional data is increasingly more costly and counterproductive unless you invest in an infrastructure that lets you process more data. How does one figure out that ideal point? Just a thought.
Did you know... (Score:5, Interesting)
Activity of the cards is ACTUALLY monitored for discrepencies in buying habits to find abusive employees who buy things for their friends?
Did you also know Wal-Mart's employee name badges have RFID tags (and have had for many years) that allow Wal-Mart to track where an employee is at any given time?
Another interesting tidbit, did you know at Wal-Mart's Jewelery warehouses they actually WEIGH the amount of metal in your body when you enter a leave? (And I don't mean they ask you to put things in a dish and weigh the dish - they scan YOU)
Another interesting thing, Wal-Mart has a fallout facility in Oklahoma that has a near-real-time backup of each BIT of that 460 terabytes of data?
Wal-Mart could survive a direct nuclear blast and still keep on a truckin'.
And, of course, if you're in a Wal-Mart home office - ISD building - distribution center - et al... and dial 911 - BOOM - you get Wal-Mart's private security? Niiice, hope it's not a real emergency, you first have to explain it to them - then if they deem it neccessary THEY will call the REAL 911!
Re:Did you know... (Score:5, Interesting)
1) poor people shop there because it's cheaper than the other stores because wal*mart gets their stuff all from china and stong arms their suppliers to give them cheaper and cheaper products.
2) to keep up with walmarts demands, the companies have to outsource more and more to china and other cheap labor countries (or just move there entirely)
3) so more people lose their jobs, become poor and have to shop at wal*mart beacuse 1) it's cheaper than everything else around, and 2) all the other local businesses are now out of business because they can't compete with the special deals wal*mart gets for buying in such huge quantities...
(goto 1)
Re:Did you know... (Score:5, Funny)
You mean like 912?
And I really hope it's not on SQL (Score:3, Informative)
How the hell can they estimate that? Assuming "less than half" means about 45%, that gives us about 207 TB. Let's just round that up to 240.148445 TB to make it a nice, even number.
Google is searching 8,058,044,651 "webpages"* -- who knows what that means. Now, Google isn't searching every single page on the internet, certainly. But also, they can't be searching pages that don't exist. So the 8bn Google pages aren't certainly all the internet. But Google isn't double or triple counting pages. Still, at 240.148445 TB (my rough estimate), we come up with a page size of exactly> 32KB per page.**
Is this just counting the text? The code for this page right here (comments.pl) weighs in at about 14KB. Wal-Mart, in no way, has twice as much info as the internet. I would say the "internet" should be measured in at least petabytes. Archive.org itself already has 1PB, and I consider any of that content available to me "on the internet".
* I'm not even counting the Google cache.
* Which means Mr. Gates over-estimated by a factor of 20 when considering how much memory we all needed!
I'm not afraid.. (Score:3, Funny)
460 TB is nothing we have 25X that (Score:4, Interesting)
Re:460 TB is nothing we have 25X that (Score:3, Funny)
Welcome to the United States of WalMart (Score:4, Funny)
From the article;
"You can see the pattern of Wal-Mart's mandates, and as Wal-Mart grows in power, it is getting more dictatorial.....Wal-Mart lives in a world of supply and command, instead of a world of supply and demand."
Hurricanes and Pop-Tarts? Bah... (Score:4, Interesting)
Stuff like: women who buy from catalogs, eat "crunchy" peanut butter, own a cat and drive a minivan you are 87% more likely to react positively to prayer in schools as a "motivating issue."
I just made that up, but it's the sort of thing they find out. No tin-foil hats here - corporations and pollsters are shelling out millions of dollars for this stuff.
Here's a few google searches links to get you started:
Acxiom [google.com]
Seisint [google.com]
This is all fine and dandy, but ... (Score:5, Funny)
half as much data until... (Score:3, Informative)
someone realized that the DB servers are actually accessible from the internet and then bam, instand 2x increase in the amount of data on the internet.
incredible! (Score:4, Funny)
It took them 460 terabytes of data to figure out that hurricanes make people buy more non-perishable food than usual?
Wow, data mining is "usefull"...
chaos in the mix (Score:3, Interesting)
If you're willing to break the law, you can even do worse harm. But I don't condone that.
Using legal methods to increase the entropy are the best way to fight the marketing databases.
New data measurement type (Score:3, Funny)
"Oh, I have a few frigabytes of data."
"Frigabyte? What's that?"
"Oh, that's a friggin lot of data."
WalMart BS (Score:5, Insightful)
WalMart is trying to make itself look like it is turning its customer data into success, and benefits for its customers. That serves to downplay its reliance on labor exploitation, monopolistic competition when it enters local markets, and political favors that structure labor and market laws to give it a competitive edge. And WalMart might just be believing the IT sales hype that it spends millions of dollars on. But that's no reason we should buy their IT BS as much as we seem to buy their wares.
What they do with computers... (Score:4, Interesting)
The systems have the layout of every walmart store in them, and the stores respond to orders from the main office to move products around on the shelves. The systems will tell various stores to move products into different places, and anaylyze the results. If a store is making more money with XYZ sitting near the entrance, then the WOPR tells more stores the move that product into place, but still plays games against shoppers with a few more. It's basically an insanely well oiled statistical war against the shoppers to squeeze every last penny out of them. I hate to say it, but it doesn't work on me when I go there. But overall, it's creepy, and impressive at the same time.
PS- I had this evil idea. If anyone is into the hactivism role, embed a voice recorder IC into a telephone set that matches your local WalMart's phones. Get the code to get on the PA system, and setup your "rouge" telephone to bump onto the PA every 5 hours or so. Be sure to include sounds to make it sound like someone is picking up the phone, and hanging it up. It will drive them nuts. Some stores seem to use Lucent sets on the wall (MLX-xxx) which are most likely ISDN on the back. Other stores seem to have analog ports on a lucent system. Just remember to give me props. Feel free to announce all shoppers a winner of a contest where they get everything they can stuff into a cart for free. Or remind them about the $700,000 in taxes the minimum wage making people cost the community at every WalMart.
Comment removed (Score:5, Interesting)
640TB ought to be enough for anybody (Score:5, Funny)
Apparently the "experts," overlooked alt.binaries.*
I've actually worked on this data before... (Score:5, Interesting)
IIRC, It seems like one of the strange correlations we found is that the two items most commonly purchased together were beer and baby diapers. Go figure...
Re:"Nothing for you to see here. Please move along (Score:4, Informative)
Re:FUCK the New York Times (Score:3, Funny)
Re:2004 = 1984 + 20; (Score:4, Funny)
Re:2004 = 1984 + 20; (Score:3, Interesting)
Listen, if you really are that paranoid, pay in cash. Then there is no way for the evil Wal-mart overlords to find you and force you to buy more pop tarts.
Re:2004 = 1984 + 20; (Score:3, Interesting)
Re:2004 = 1984 + 20; (Score:3, Funny)
Nope, its location. (Score:5, Informative)
And everyone says something about leveraging technology and JIT delivery, etc.
Professor Liu [jhu.edu] says "Nope. Location."
Walmart chose most of their initial locations in cities/regions where there was no other competition. Places where there was no Kmart, no department stores, no malls. And they flourished.
Re:Expert source (Score:3, Funny)
Re:Huh? (Score:5, Interesting)
Understanding your method of assessing the data includes lumping data about vendors, data about shipping, inventory status (alone, a huge category), etc., 1.5 MB "per person" isn't huge. The error is in your model as most of the system contains data about things other than customers.
That said, you would be surprised what
The best thing a consumer can do to counteract this consumer surveillance is to toss junk into the system. Here are a few suggestions:
- borrow your mom's/mother-in-law's card and go on a shopping spree for frozen pizzas, candy corn, condoms and saran wrap.
- apply for new cards all the time. provide creative answers as to your address, occupation (animal disposal officer is one of my favorites - someone must be puzzled how many dead animals there are in my city from all the people with this occupation). BE SURE TO ONLY USE CASH with these cards so they don't get an identification anchor.
- spike the data with sustained purchases of one product for a period of time. this is especially fun at smaller retailers that use inventory management - keep buying them out of one product (preferably low cost and low shelf inventory so it is easier and cheaper to do). keep it up for 90 days. then stop buying it and go to another store.
The more you can junk up purchases (especially on anchored cards like friends, in-laws, etc. that have different buying habits), the less valuable the database is.
Re:Huh? (Score:5, Funny)
[RM101's mind boggles]
Dude, do you seriously have nothing better to do than spend this crazy amount of time feeding junk data into a supermarket computer? Go outside. Breathe the air.
I dunno, maybe you WILL lay on your death bed, not thinking of your wife, or children, but you'll be proud of how many hours you spent contaminating some database.