Choose a Better Train With Web Scraping (hackaday.com) 50
szczys writes: Tired of his trains being constantly late, Eric Evenchick headed to the Via Rail (Canada's communter train service) website to find which trains had a better on-time rate. Unfortunately they only offer three days worth of data through the dropdown selections — but a bit of investigating showed the GET requests were open for about the last six months. Evenchick built a web-scraper with Python, along with a web interface that queries the resulting SQL db. The harvested data shows system-wide delays that average more than twelve minutes (mostly due to commercial rail having the right-of-way). The good that comes of this? You can now choose your train based on smallest likelihood of delay..
Canada's communter train service (Score:4, Informative)
>> Canada's communter train service
But do they have anything for commuters?
Re: (Score:1)
A communter is a commuter going to a commune. So, yes, they do have something for that specific subset of all commuters.
See Via Rail limiting the GET requests in... (Score:4, Funny)
See Via Rail limiting the GET requests in 3... 2... 1... :)
Well, OK, there's the weekend ahead, perhaps Monday?
In any case it does look like commuter rail is a 2nd class citizen in Canada.
Re: (Score:3)
FTFY
Re: (Score:2)
Or fixing their database to delete rows older than 3 days.
Then again, sometimes the right thing does happen - the company involved makes the data available and makes everyone happy. I mean, if the train is delayed because of other rail traffic, then maybe if the government comes asking about on-time rates being so poor, they can show them the data.
Re: (Score:2)
There are many ways to work around that, e.g. crawlera.com (disclaimer: working there)
Violating ToS? (Score:3, Insightful)
Check the site's terms of service, scraping site contents may be in violation of the ToS.
I wrote a similar app about 15 years ago to scrape the Edmonton Transit System's route schedules (conveniently posted in generally well structured HTML at the time) so I could build a relational system and try and sort out predictive routes / times. Then I found out what I was doing was in violation of their ToS, I stopped my scraping service immediately (before getting called on it).
Re: (Score:1)
I'm waiting to see a court tell someone they can't use software except a federally approved browser to retrieve data from a web URL.
Re: (Score:1)
Yes, my point is there will always be tools like that. They are no different conceptually from a browser, unless you want to start enforcing mandatory adherence to HTML rendering specifications in which case ALL the major browser companies are going to be terrified.
Now, the last few years have definitely taught me our legal system never backs down from a challenge to make horrible decisions, so I am sure eventually this could be legally problematic, but for now, the point is, as long as you are using the s
Re: Violating ToS? (Score:1)
Learn how proxies work you milquetoast pussy.
Re: (Score:1)
If we're now in a world where one an be bound by terms one never agreed to, then my terms of service to Rail Canada reads as follows:
"By returning data to my browser's HTTP request, you hereby agree that you owe me one million dollars. If you do not agree with these terms, you may not return data to my computer."
What's that? They will add me to a block list? Sorry, too late already. The debt is already incurred, when they first agreed to my terms by returning said data.
Re: (Score:2)
Using there site is an agreement to the ToS. Are you dense or really that stupid?
Re: (Score:2)
And yes I typo'd their. Bite my ass.
Good luck with that (Score:2)
Clearly the website is based on a loophole, which can/will be closed at any time. Given the litigious nature of most corporations (and in this case, possibly a government agency), I wouldn't be surprised if the author doesn't get a cease & desist and/or lawsuit coming his way.
Other than that, this is pretty awesome and a hacker-worthy effort.
It's nice to see something good for a change (Score:2)
It's not often that sloppy security on commercial sites are working in favor of their customer.
Neither commuter nor "communter" (Score:5, Informative)
VIA Rail is NOT a commuter train service. It offers "intercity passenger rail services", not commuter service, which Wikipedia defines better than I can: "Commuter rail, also called suburban rail, is a passenger rail transport service that primarily operates between a city centre, and the middle to outer suburbs...". Again, not what VIA Rail primarily does.
Examples of agencies which offer commuter rail service in Canada include Greater Toronto's GO Transit trains and Montreal's AMT. These do, indeed, offer service between communities forming part of a greater metropolitan area and said area's city centre. At least in Montreal, the AMT has some exclusive tracks and agreements on shared tracks which prioritize commuter trains over other scheduled trains at rush hour.
Nah (Score:3)
I'd rather choose my train based on where it's headed.
Being on time at the wrong destination is kinda useless.
Guy Writes Script (Score:2, Insightful)
So a guy wrote a script. Good for him, I guess, but why is this on /.?
Re: (Score:2)
By Slashdot's standards today that makes you a programming wizard.
Re: (Score:2)
Re: (Score:2)
The guy who wrote the script is probably checking all the different services from where he is to where he wants to go to figure out which time of the day he should travel (and on which service) to have the greatest chance of avoiding delays.
Some train companies have an API for that... (Score:4, Informative)
See the National Rail Enquiries APIs. Loads of information on train timetables, delays, maintenance schedules, and almost all for free.
http://www.programmableweb.com/api/national-rail-enquiries
Philadelphia (Score:1)
Not for long (Score:2)
"The good that comes of this? You can now choose your train based on smallest likelihood of delay."
Do it quickly, because, like always in these cases, the guy will be sued for data theft in 3, 2, 1, ...
wrong solution (Score:1)
Get up earlier instead of staying up too late writing silly scripts.
Re: (Score:2)
How does getting up earlier deal with issues of the train being late? If the 7:00 AM train consistently comes at 7:10AM, waking up 10 minutes earlier does nothing.
Nor does it help if the train usually comes in at 7:00AM, but sometimes comes in at 7:10AM.
Neither does waking up early help if the train (let's say it departs at 7:00AM and arrives at 8:00AM) consistently comes at 7:00AM and routinely get