Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
Check out the new SourceForge HTML5 internet speed test! No Flash necessary and runs on all devices. ×
Python Government Stats Transportation

Choose a Better Train With Web Scraping (hackaday.com) 50

szczys writes: Tired of his trains being constantly late, Eric Evenchick headed to the Via Rail (Canada's communter train service) website to find which trains had a better on-time rate. Unfortunately they only offer three days worth of data through the dropdown selections — but a bit of investigating showed the GET requests were open for about the last six months. Evenchick built a web-scraper with Python, along with a web interface that queries the resulting SQL db. The harvested data shows system-wide delays that average more than twelve minutes (mostly due to commercial rail having the right-of-way). The good that comes of this? You can now choose your train based on smallest likelihood of delay..
This discussion has been archived. No new comments can be posted.

Choose a Better Train With Web Scraping

Comments Filter:
  • by xxxJonBoyxxx ( 565205 ) on Friday December 04, 2015 @05:04PM (#51059429)

    >> Canada's communter train service

    But do they have anything for commuters?

    • A communter is a commuter going to a commune. So, yes, they do have something for that specific subset of all commuters.

  • by Ecuador ( 740021 ) on Friday December 04, 2015 @05:06PM (#51059441) Homepage

    See Via Rail limiting the GET requests in 3... 2... 1...
    Well, OK, there's the weekend ahead, perhaps Monday? :)
    In any case it does look like commuter rail is a 2nd class citizen in Canada.

    • by cdrudge ( 68377 )

      See Via Rail initiating lawsuit against Eric Evenchick in 3... 2... 1...

      FTFY

    • by tlhIngan ( 30335 )

      See Via Rail limiting the GET requests in 3... 2... 1...

      Or fixing their database to delete rows older than 3 days.

      Then again, sometimes the right thing does happen - the company involved makes the data available and makes everyone happy. I mean, if the train is delayed because of other rail traffic, then maybe if the government comes asking about on-time rates being so poor, they can show them the data.

    • by Kergan ( 780543 )

      There are many ways to work around that, e.g. crawlera.com (disclaimer: working there)

  • Violating ToS? (Score:3, Insightful)

    by Anonymous Coward on Friday December 04, 2015 @05:07PM (#51059453)

    Check the site's terms of service, scraping site contents may be in violation of the ToS.

    I wrote a similar app about 15 years ago to scrape the Edmonton Transit System's route schedules (conveniently posted in generally well structured HTML at the time) so I could build a relational system and try and sort out predictive routes / times. Then I found out what I was doing was in violation of their ToS, I stopped my scraping service immediately (before getting called on it).

    • by Anonymous Coward

      I'm waiting to see a court tell someone they can't use software except a federally approved browser to retrieve data from a web URL.

    • by Anonymous Coward

      Learn how proxies work you milquetoast pussy.

    • by Anonymous Coward

      If we're now in a world where one an be bound by terms one never agreed to, then my terms of service to Rail Canada reads as follows:

      "By returning data to my browser's HTTP request, you hereby agree that you owe me one million dollars. If you do not agree with these terms, you may not return data to my computer."

      What's that? They will add me to a block list? Sorry, too late already. The debt is already incurred, when they first agreed to my terms by returning said data.

  • Clearly the website is based on a loophole, which can/will be closed at any time. Given the litigious nature of most corporations (and in this case, possibly a government agency), I wouldn't be surprised if the author doesn't get a cease & desist and/or lawsuit coming his way.

    Other than that, this is pretty awesome and a hacker-worthy effort.

  • It's not often that sloppy security on commercial sites are working in favor of their customer.

  • VIA Rail is NOT a commuter train service. It offers "intercity passenger rail services", not commuter service, which Wikipedia defines better than I can: "Commuter rail, also called suburban rail, is a passenger rail transport service that primarily operates between a city centre, and the middle to outer suburbs...". Again, not what VIA Rail primarily does.

    Examples of agencies which offer commuter rail service in Canada include Greater Toronto's GO Transit trains and Montreal's AMT. These do, indeed, offer service between communities forming part of a greater metropolitan area and said area's city centre. At least in Montreal, the AMT has some exclusive tracks and agreements on shared tracks which prioritize commuter trains over other scheduled trains at rush hour.

  • by zAPPzAPP ( 1207370 ) on Friday December 04, 2015 @05:24PM (#51059575)

    I'd rather choose my train based on where it's headed.

    Being on time at the wrong destination is kinda useless.

  • Guy Writes Script (Score:2, Insightful)

    by Anonymous Coward

    So a guy wrote a script. Good for him, I guess, but why is this on /.?

  • by Anonymous Coward on Friday December 04, 2015 @05:39PM (#51059683)

    See the National Rail Enquiries APIs. Loads of information on train timetables, delays, maintenance schedules, and almost all for free.
    http://www.programmableweb.com/api/national-rail-enquiries

  • For Phildelphia is the US, please see TrainView and http://phor.net/apps/septa/ [phor.net] This includes a live and 8-year history of train on-time-performance and analysis of lateness.
  • "The good that comes of this? You can now choose your train based on smallest likelihood of delay."

    Do it quickly, because, like always in these cases, the guy will be sued for data theft in 3, 2, 1, ...

  • Tired of his trains being constantly late,

    Get up earlier instead of staying up too late writing silly scripts.

    • by tlhIngan ( 30335 )

      Tired of his trains being constantly late,

      Get up earlier instead of staying up too late writing silly scripts.

      How does getting up earlier deal with issues of the train being late? If the 7:00 AM train consistently comes at 7:10AM, waking up 10 minutes earlier does nothing.

      Nor does it help if the train usually comes in at 7:00AM, but sometimes comes in at 7:10AM.

      Neither does waking up early help if the train (let's say it departs at 7:00AM and arrives at 8:00AM) consistently comes at 7:00AM and routinely get

Two percent of zero is almost nothing.

Working...