Searchable C/C++ DB surpasses 275 million lines 328
Sembiance writes "I've been working on a C/C++ source code search database for the past year. It has recently surpassed 275 million lines of searchable open source C/C++ code. The search engine is C/C++ syntax aware so you can search for specific elements such as functions, macros, classes, comments, etc. The site is built upon many open source products including: MySQL and Lucene for the database, CodeWorker to parse the code, PHP and Apache for the website and GeSHi for syntax highlighting. I'm currently looking for suggestions on what sort of 'interesting statistics' I could create from 275+ million lines of open source C/C++ code."
Some statistics to get you started (Score:5, Funny)
The following "interesting statistics" come to mind:
You gotta get the variables searchable. Most critical for that last statistic. Also, I'm too lazy to learn Lucene Query Parser Syntax [apache.org], so the statistics for "Natalie Portman" may include references to "portman."
useful statistic (Score:5, Funny)
Similarity checking (Score:5, Funny)
SCO (Score:2, Funny)
But then again, probably not...
ratio (Score:5, Funny)
Re:useful statistic (Score:5, Funny)
275+ million lines (Score:1, Funny)
Suggestion (Score:5, Funny)
How about a new server?
Re:And then... (Score:3, Funny)
Re:My vote is for... (Score:2, Funny)
interesting stat (Score:3, Funny)
2) compile
3) execute
4) ???????
5) PROFIT!
Woman (Score:2, Funny)
Eh.
Re:Hit Refresh (Score:3, Funny)
stats we'd like to see... (Score:5, Funny)
-# of ( ),{ },\
-time spent debugging/compiling
-total hours spent in production
-gallons of coffee consumed
-hours of daylight seen
-# of relationships destroyed
Need to watch those stats (Score:3, Funny)
Re:useful statistic (Score:5, Funny)
Sounds like you should have written it in C++ instead of a laggard language like PHP
Re:Size doesn't matter (Score:3, Funny)
or "// FIXME" (Score:5, Funny)
Re:Statistics: (Score:3, Funny)
So the code calls 61,718,232 functions which don't even exist?
But maybe they just meant "Total Number of Function Calls"
Re:My vote is for... (Score:5, Funny)
Re:275+ million lines (Score:3, Funny)
No, no, no.
You do not use lines 1..N on the same lady until it works. It's not like breaking encryption -- you don't get to try all the possible keys.
I have friends who have done this, and they swear it's a percentage game. Choose one line you like, and try it on women 1..N until it does work, or you get tired of getting told to sod off. Apparently, with the right combination of variables, any line can be verified to work under some circumstances.
Truthfully, I don't know how anyone can set out with the knowledge they're going to get told to drop dead 70-100 times/night, but I guess if you can live with that kind of failure rate on an ongoing basis, you'll eventually get the success rate you wanted.
Now go forth young geek, and attempt to multiply.
Re:ratio (Score:5, Funny)
Search -- foo -> Results 1 - 10 of about 26,600,000 for foo. (0.06 seconds)
Search -- bar -> Results 1 - 10 of about 385,000,000 for bar [definition]. (0.16 seconds)
Search -- foo bar -> Results 1 - 10 of about 7,900,000 for foo bar. (0.12 seconds)
'bar' wins. This intuitively makes sense, as who would want to go to the 'foo' for a drink, or eat an 'energy foo'? Could you imagine a lawyer being 'dis-fooed'?
Re:histogram of C reserved words (Score:5, Funny)
2431 int
1802 goto
Re:Please check for this: comma in brackets in C++ (Score:3, Funny)
Also, C++ programmers are getting really old, and they don't handle change very well.
Re:And then... (Score:2, Funny)
-1, Redundant
This is Slashdot, of course we're all single.
Re:useful statistic: parent: -1 troll (Score:4, Funny)
I know PHP is a great web language and that it probably isn't the cause of the slowdown. Heck, even Yahoo! uses it these days.
I was attempting (unsuccessfully, it seems) to make fun of the purists who insist that robust web applications must run on something compiled in order to reach acceptable performance under high load.