Forgot your password?
typodupeerror
GUI Programming Software

MIT Offers Picture-Centric Programming To the Masses With Sikuli 154

Posted by timothy
from the mind's-eye dept.
coondoggie writes "Computer users with rudimentary skills will be able to program via screen shots rather than lines of code with a new graphical scripting language called Sikuli that was devised at the Massachusetts Institute of Technology. With a basic understanding of Python, people can write programs that incorporate screen shots of graphical user interface (GUI) elements to automate computer work. One example given by the authors of a paper about Sikuli is a script that notifies a person when his bus is rounding the corner so he can leave in time to catch it." Here's a video demo of the technology, and a paper explaining the concept (PDF).
This discussion has been archived. No new comments can be posted.

MIT Offers Picture-Centric Programming To the Masses With Sikuli

Comments Filter:
  • FrontPage? (Score:3, Interesting)

    by Itninja (937614) on Thursday January 21, 2010 @05:25PM (#30851710) Homepage
    Sounds like the Microsoft FrontPage of coding software. Why do with text what you can do with pictures? And we all know FrontPge went on to become the defacto standard for web development....that had to be fixed by an real web developer later.

    But on the upside, dedicated FTE's for "reinstalling corrupted FrontPage extensions" did skyrocket during the FrontPage era.
    • Better (Score:3, Interesting)

      by pavon (30274)

      Actually I think this is more interesting than either FrontPage or LabView, because it allows you to script GUI apps that were not designed to be scriptable. Even for apps that are scriptable, it provides an increase in user efficiency as you don't have to learn the API commands to do things that you already know how to do in the GUI.

      How useful it is will depend on how well the image pattern matching deals with corner cases. Consider you need to click on a text field, however there are many identically loo

      • by BitZtream (692029)

        I can think of at least 3 ways of doing (scripting gui apps that aren't scriptable) already that have been around for years.

        • Your post would be better if you named the three you're thinking of. I wonder if they're the same ones I'm thinking of..

    • Re:FrontPage? (Score:5, Informative)

      by gad_zuki! (70830) on Thursday January 21, 2010 @05:53PM (#30852272)

      >And we all know FrontPge went on to become the defacto standard for web development....that had to be fixed by an real web developer later.

      Do you want to democratize technology or just have it controlled by elites? Non-techies want to do things like scripting and web design without paying a professional, the same way they want to fix things around the house or fix the car. When it comes to small or easy jobs, a non-expert can do just fine. Why should we piss on the DIY'ers because they dont have a Master's degree in CS? Frankly, a lot of computer stuff is pretty easy and paying someone is ridiculous.

      While Im certainly no fan of Frontpage, I feel that it wasnt much worse than Mozilla Composer or other WSIWYG html composers.

      • Re: (Score:2, Insightful)

        by mustafap (452510)

        >Do you want to democratize technology or just have it controlled by elites?

        Neither. I'd like to see people who wish to program, learn how to.

        • Re: (Score:3, Insightful)

          Yeah, that's real easy for a programmer to say. Ever used a brownie mix? I'll bet a pastry chef would say, "I'd like to see people who wish to bake brownies actually learn how to bake brownies properly." Tools like Sikuli are the programming equivalent to brownie mix. It's easy gratification. (... or at least easier than learning to capture part of the screen and then do fuzzy image pattern matching on it.) If I were a very casual, light duty programmer, this would be pretty helpful sometimes.
          • Actually brownies (or a lot of food) isn't that hard to cook if you follow the instructions and people would probably be healthier if they would actually learn how to cook rather than rely on chemically pre-made foods. It's no surprise some of the fattest nations are full of people who can't cook.
          • by mustafap (452510)

            >Ever used a brownie mix

            Yes, but I didn't consider myself a chef afterwards

      • by Yvan256 (722131)

        The problem with FrontPage wasn't the users, it was the code that it produced.

      • by Xiaran (836924)
        Elite or competent? I'm all for people tinkering with software in their spare time the problem is people who arent qualified start thinking *everything* in software development is as simple as the tiny little things they are doing. Then we end up with Visual Basic(the birth of Visual Basic came with the motto "its so easy you know longer need programmers... managers can write the code"... that worked out well).
        • Re:FrontPage? (Score:5, Insightful)

          by BobMcD (601576) on Thursday January 21, 2010 @07:09PM (#30853586)

          Then we end up with Visual Basic(the birth of Visual Basic came with the motto "its so easy you know longer need programmers... managers can write the code"... that worked out well).

          From a business point of view, it actually did. People used VB, and particularly VB macros in Office, to do things that resulted in a lot of dollars flowing through a lot of organizations. Yes it did eventually need to be changed out, but in it's time, for it's purpose, you can't really fault it. It truly did work.

          • Yes if you ignore the fact that VB macros causes loads of security head aches and issues with compatibility.

            Imo, VB wasn't actually that bad in the hands of people that took a real effort to learn how to program but in the hands of most it was a freaking nightmare and caused loads of problems.

            DIY programming is fine for personal use but it should never be used for businesses.
            • by BobMcD (601576)

              Security issue, yes. Loads of problems, yes.

              Doesn't obviate what I said. In the moment lots of dollars changed hands because of it. That, among all else, is what a business would use to determine 'success'.

      • by idontgno (624372)

        Why should we piss on the DIY'ers because they dont have a Master's degree in CS? Frankly, a lot of computer stuff is pretty easy and paying someone is ridiculous.

        Thousands of cars on cinderblocks and dozens of houses with flooded basements are testimony that sometimes, paying someone is the only thing that isn't ridiculous. There's DIY, and there's "OMG you are SO in over your head." Anyone whose software development abilities are so stunted that the "advancement" outlined in TFA would help them is absol

      • There is a minimum level of skill and talent required to do anything. The only thing that happens when you make something "so simple anyone can do it", is a minefield of crap software. Instructing a computer to do something requires the ability to think abstractly, and organize/plan with an orders of magnitude more sophistication than "Do I want eggs or pancakes for breakfast?". Arguing that the 'elites' are pushing down the 'DIYers' is disingenuous. A real DIYer will overcome the learning curve of what

      • Do you want to democratize technology or just have it controlled by elites?

        If I have to be in charge of cleaning up their mess then I would prefer that they leave development to the professionals. I think that is what the parent is getting at. We professionals are tired of rescuing dabblers who get in over their heads because their "easy to use" tools are just powerful enough to get them into trouble, but not powerful enough to get them out. If people agree to be responsible for their own results, good or bad, then I say let them do as they wish. Unfortunately, it never seems to w

      • HTML and CSS is pretty easy to learn, at least enough to produce the shit people produce with front-page.

        Sikuli is good though it would appear it takes control of your computer so really it's pretty useless aside from dong batch repetitive jobs. You can probably get around that by learning more Python but the demo didn't interest me but I'm sure loads of people will love it for automating tasks.

        Front-page on the other hand was awful and produced loads of awful sites that unfortunately affect more peop
  • Potential (Score:2, Insightful)

    by zero0ne (1309517)

    Especially for Testing your GUI.

    This seems like AutoIT but with image recognition (instead of having to input mouse coordinates).

    • by Jonah Hex (651948)

      Watching the YouTube demo, I immediately thought of how basic this is compared to AutoIT's functions, and even the quick record function is faster to "program" with than this screenshot function.

      It says it can tolerate some changes, but what if there is a completely different visual theme installed? What if a drop down is not on the same item it was when you made the script? AutoIT can take care of this by reading the underlying GUI code to allow for these kind of things. As someone who has been automating

      • by TheLink (130905)
        I wonder how Sikuli copes with "click page down till you find the icon you need to actually click on".

        How about if the stuff you click on might look rather different each time? e.g. the IP address might not be 0.0.0.0 but something else the second or third time around.

        And what if the stuff you need to click on can only be identified by text or an icon that you don't click on - you actually click on the stuff to the right (or left or whatever) of it. This one isn't a biggie - it shouldn't be too difficult to
    • by BitZtream (692029)

      The MS test crap in the latest versions of VisualStudio do it as well, and they'll be happy to find a button (if its a standard control) to click on using other data rather than mouse coordinates as well.

    • by gad_zuki! (70830)

      >This seems like AutoIT but with image recognition (instead of having to input mouse coordinates).

      Right, its AutoHotKey/AutoIT with a nicer OCR library. Perhaps this will light the fire under the butts of the AutoHotKey devs and add in some smarter screen reading and browser integration.

    • Re: (Score:1, Interesting)

      by Anonymous Coward

      Eggplant [testplant.com] says hi.

      As a professional test automator, I'd like to point out that automation by image recognition is the method of last resort. The #1 concern in GUI automation is maintainability, and image recognition is the least maintainable method of automation there is short of recording mouse coordinates and keypresses. If you change your theme, if the developer rearranges the controls, if any text is changed, the script is broken. The idea of using image recognition for web page automation is right out.

    • Re: (Score:2, Interesting)

      by jdimatteo (1726948)

      I am currently working on automated GUI tests for an application, and Sikuli looks pretty great -- even when compared to enterprise level automated GUI testing tools costing in the order of thousands of dollars per user licence.

      Some of the commenting below on maintainability problems seem pretty superficial. For example, to ease maintainability you could build a framework abstracting GUI component images from regression test scripts. For example, you could assign a screenshot as a variable and then refer

      • by ComaVN (325750)

        I just tried it out for an hour or so with our web application, and it seems to be doing it's job. One thing that it didn't manage to do is click somewhere relative to the matched image. It always seems to click in the middle of the image, which is annoying when you want it to click one checkbox out of many based on it's preceding label.

        Perhaps it's possible to use some kind of nesting, so you could try to find the image of the checkbox inside a previous match that includes the label, but I didn't find out

    • I was thinking the same thing. Where this would be really handy would be in applications that paint their own windows and don't expose the gui handles for AutoIt to latch on to. Specifically, this would work great for Great Plains or online poker clients. :)

      -ellie

  • MMO macro maker? (Score:5, Interesting)

    by visgoth (613861) on Thursday January 21, 2010 @05:32PM (#30851842)
    This looks like a powerful tool for gold / isk / whatever farming. I'm tempted to resurrect my eve account and see if I can make an auto-miner script.
    • by BoppreH (1520463)
      Things to take into account:

      - selecting and clicking on see-through buttons (the background will change too much)
      - the program access to the actual game for seeing, clicking and typing
      - the game's anti-hack detection / counter-measures
      - macro playing lag (see video)

      But it seems very promising nevertheless.
      • by Arimus (198136)

        Add in the number of pilots who even if they're anti-pirate operate a KOS policy when it comes to macro miners....

    • Re: (Score:1, Offtopic)

      by burkmat (1016684)
      I don't know how much experience you have in EVE, but generally, if you're AFK you're dead meat. Suiciding miners even in hisec is quite fashionable these days.
      • Re: (Score:2, Offtopic)

        by visgoth (613861)
        I've done a fair bit of mindless semi-afk mining during my time playing eve, and never had much trouble with suicide attackers, can flippers, or other such stuff. I'd imagine that taking the usual minimal precautions like parking in a dead end, low traffic system would work relatively well.

        Depending on how robust sikuli is, it might be possible to make a mission running macro, which could be even safer than blasting rocks (with the right ship setup, and such). Barring that I'd likely use sikuli on a secon
  • by Anonymous Coward on Thursday January 21, 2010 @05:38PM (#30851964)

    "Computer users with rudimentary skills"..... "with a basic understanding of Python"?

    • by BitZtream (692029)

      You're reading a story about MIT on slashdot.

      Two groups that are so utterly disconnected from the real world that they both have no idea why their favorite toy hasn't taken over the world even those its the simplest, most efficient, easiest to use, most feature rich (insert whatever here) on the planet.

      Most of both groups probably think grandma knows assembly as well.

    • by Fred_A (10934) <fred AT fredshome DOT org> on Thursday January 21, 2010 @06:17PM (#30852710) Homepage

      "Computer users with rudimentary skills"..... "with a basic understanding of Python"?

      Computer users with a rudimentary skill who do not have a basic understanding of Python can always build a Python programming AI in Lisp (or at least that's what I gathered from the MIT docs I browsed) and thus save themselves the trouble.

    • by Alex Belits (437) *

      Moar liek BASIC understanding of a python.

    • Re: (Score:3, Informative)

      If a friend wanted to learn just enough programming to do a few light chores, what would you recommend? Python is arguably one of the easiest languages to learn. Randy Pausch used it for Alice [alice.org], which has been successful for teaching middle school girls how to program. So if "computer users with rudimentary skills" means rudimentary programming, then that works for me.
      • by iluvcapra (782887)

        Python is arguably one of the easiest languages to learn.

        I can't wait to explain to my mom the difference between four spaces and one tab, just to name one of Python's endless oddities.

  • by Anne Thwacks (531696) on Thursday January 21, 2010 @05:41PM (#30852030)
    Yeah - lets hear it for a new development model:

    For years I have been asking for a softwsare development tool that allows me to write PHP code by throwing cow-pats at the screem with the Wiimote.

    And my colleagues wat a tool that allows dispatching my bugs with the Wii gun attachment they use in "Quantum of Solace".

  • FTFA: "Sikuli -- which means God's eye in the language of the Huichol Indians in Mexico". Mexican Indians love their hallucinogenic Peyote [wikipedia.org]. On the other hand, MIT researchers want the masses to program with the mouse. Well, I know about "correlation is not causation", but MIT sure is an interesting place to be.

  • by Anonymous Coward

    Yea- this might work until the icons change. I don't see this working too well in practice. I don't know about Mac- but on my Ubuntu system the icons got updated last week. And it happens often enough that these scripts would need updating to be a serious pain and expense. It isn't like an ordinary user could figure this stuff out either. Despite it being so simple your still going to need an IT person to create these scripts. Now you just have dumber IT people. Probably people who COST you more money in p

  • by SmallFurryCreature (593017) on Thursday January 21, 2010 @05:52PM (#30852232) Journal

    From what I seen is this a macro program that can use screenshots rather then key/mouse data to automate tasks. So you PROGRAM your PC in the same way you PROGRAM a VCR to record a show. It is NOT the same as writing an application.

    But it seems very intresting once you got past this difference. Macro's are very handy for testing in my experience but often have a problem because a tiny mis-alignment can ruin it all. If this program is smarter because it can regonize where data is supposed to go... well that would certainly make automated tests a bit easier.

    Interesting stuff. Just don't think you will be writing software with this.

    • by eulernet (1132389)

      Interesting stuff. Just don't think you will be writing software with this.

      Since a few years, programming has become equivalent to placing Lego bricks in the correct order (I'm working with Microsoft .NET and tons of components).

      So I'm not very surprised by the approach, as long as we can find all the possible varieties of pieces.

    • Re: (Score:1, Interesting)

      by Anonymous Coward

      Don't use a tool like this for testing. Start with AutoIt or nunit+white [codeplex.com], and look at commercial tools if those don't do what you need.

    • Re: (Score:1, Interesting)

      by Anonymous Coward

      Exactly! I'd love to see Sikuli's one new trick integrated into an existing, popular macroing system like AutoIt or AutoHotKey.

  • I'm suddenly reminded of horrible apps written in VB97, with no concern for the back end, horrible input kludge, etc.
    • I'm suddenly reminded of horrible apps written in VB97

      You're 93 versions ahead of your time - VB6 was the last version of Visual Basic before .NET.

      Perhaps more to the point, this not only targets a completely different purpose than Visual Basic, but also looks nothing like it whatsoever.

    • That's OK. For most VB apps there wasn't any "back end".

  • Otherwise it's just not complete, IMHO.
    • Re: (Score:3, Interesting)

      by Seor Jojoba (519752)
      Yes, you could use Sikuli to fire up a text editor, individually press the keys to write all the lines of code, launch the compiler/linker/whatever. So it meets your weird definition of completeness. However, I suspect you could not use Sikuli to write a program that writes a Sikuli program to write Sikuli. I could be wrong, though.
  • ... but does anyone knows if the program is always that slow?

    I understand that it has to visually find the button and this is computationally expensive, but the 2~3 seconds lag didn't seem compatible with the task.

    On a sidenote, the video states that there's no "internal API" dependence, but it clearly has to send "click" and "type" signals. Is that really OS independent or was it just an overstatement?
    • by babyrat (314371)

      the video states that there's no "internal API" dependence

      I suspect they were referring to internal API of the program being controlled. ie COM, Corba, etc...

  • lame (Score:2, Insightful)

    by Charliemopps (1157495)
    This is the same sort of scripting you can do with many already existing languages. Autohotkey for example. The only new feature would be the ability to copy the screenshot directly into the program as apposed to taking it outside the program and referencing the file directly. I'd say that this scripting language is actually weaker because of it. As far as using this inside a game... they are already hardened against this sort of thing. For example, next time you're in EVE look at the buttons you use. They
  • It can script GUI actions in much the same way. Granted it's not a very nice environment for more complicated work, but still.

    • by babyrat (314371)

      The last time I tried to use Applescript on windows or linux, it wouldn't even start up.

  • by Seor Jojoba (519752) on Thursday January 21, 2010 @06:40PM (#30853122) Homepage
    Come on, let's cut through the default Slashdot snark. The image capture aspect of Sikuli is brilliant! I don't like the tagline "program anything with Sikuli" because 99% of software should be written in something else. But think of writing test scripts that can use the image matching features. If the software works as advertised, then you could throw together UI test cases way faster than anything else I've seen. System administration tasks should be a good match too. The resulting code would be brittle and hard to maintain, but for quick one-off scripts, sure... I can see it.
    • by rmcd (53236) *

      Couldn't agree with you more. I'm surprised by all the negativity. And it seems to me this is innovative enough to have uses that no one here is thinking about right now.

  • The script may not work if the UI style is different from the one recorded or if the UI language is different from the one recorded. Generally, any option that can change the UI from computer to computer will create a problem for Sikuli.

    • It's even worse than that... Just change your icon or window border theme and watch every Sikuli script break.

      The great thing about all other languages except Sikuli is: When you change your Icon or window border theme the programs still run.

  • by Anonymous Coward
    This time for sure!
  • by presidenteloco (659168) on Thursday January 21, 2010 @07:21PM (#30853784)
    if NOT understand logic then
       loop
          talkTo (self, "Don't program!")
          Look (@ Pretty pictures)
       endloop
    endif
  • I'd be curious to see how they handle the back end, especially as some others pointed out it does make calls that seemingly require some hook into the OS. As for its usefulness, I doubt it will really take off beyond being a decent prototype. It relies on image matching so if you use and change a custom icon set all your scripts would be kinda worthless. Same goes if the programs you are "screenshot scripting" receive a major overhaul in the GUI department. Until it can address those issues, I doubt it wil
  • by tucuxi (1146347) on Thursday January 21, 2010 @08:12PM (#30854416)

    Sikuli is certainly not commercial-grade UI testing software. It was never intended to be, this is academic software written to explore ideas, rather than to polish them to perfection. Also, it is not a "general" programming language. The previous posters that compared it to video-programming are right: not all programs have to target complicated algorithms and data-structures, there is plenty of space for automating "simple stuff".

    As an idea, I find the readability of the code particularly interesting. Sikuli code is about the closest you can come to self-explanatory, step-by-step instructions on how to achieve whatever a particular program does. Add a few comments to the most arcane steps, publish those programs to an online repository, and presto! executable step-by-step tutorials.

    Yes, the developers may have to address the variability of themes on people's desktops. It is certainly possible to do so (for instance, by keeping a list of mappings from any of a set of "supported" themes to a "canonical" theme, which would be used in all examples), but, as far as ideas go, I really think that Sikuli is a very refreshing idea.

    • Re: (Score:3, Interesting)

      by tristanreid (182859)

      I totally agree. I watched the youtube video (is WTFYV the equivalent of RTFA?), and I was kind of impressed. Although the demo shows an interaction with a bunch of buttons, the real power is the image recognition. She showed how with one command each you can script the two of the fundamental interactions you have with images on the screen: click it, or wait for it to appear. The fuzzy visual recognition algorithms are a huge plus. If you wanted to script something in your room using a web-cam, this i

  • I just open this can of worms up, but the first thing I thought of after seeing the demo was, "Can I push a button on a Flash page?"
    • by phi2one (762028)
      I am wondering the same thing myself; If all it's doing is scraping the screen buffer somehow, I don't see why not.
  • Some accountants seem to think everyone needs to learn accounting in order to function in society. But people have other jobs. Some of us like our dumbed down tools because they fill a need. My tax software lets me do my taxes without learning "proper" accounting. Similarly, I know some people who benefit greatly from a little passing knowledge of high-level scripting languages like VB, JavaScript, or even Python.

    For those kinds of people, Sikuli looks pretty cool because they can do things that would b

  • Wow they just created the old VB SendKeys command. I was actually doing stuff like this 12-14 years ago with SendKeys command in VB. In "practical" use back then
    it sucked and I am certain that has not changed.

  • I did this exact same thing in AutoIt [autoitscript.com], except that it needs exact matches of images instead of a fuzzy recognizer. (Plus, I also had rule triggers and state vs just a single list of imperative commands)

    The fuzzy match is a nice addition, but this automation concept has been available for years.

    • by mrjb (547783)

      The fuzzy match is a nice addition

      and probably an obligatory one as well. If the screenshot is a (lossy) jpeg, the image recognition simply won't work unless it is at least somewhat fault-tolerant.

      • What AutoIt does is take a hash of the pixels in a rectangular area. If you interactively capture an area's hash when the screen is in the desired state, then that area can be scanned during the script run to see when/if it matches the desired hash again. The area's location can be relative to a window, control, screen, etc, and the software can scan around various locations in case it moved.

        There's no lossiness in any of the image manipulation, but the same pixels need to show up.

  • Just Great... all the spammers need now is a few CAPTCHA deciphering Sikuli plug ins.

    Once that's done we can all go back to manually removing spam from our web forums and in-boxes.

  • How you sanitize your inputs in a language that checks what is displayed on the screen? Instead of xss or sql injection you could end being hacked by watching a mail attached normal picture if that kind of programming becomes popular.
  • It is basically expect [wikipedia.org] script for GUIs.
  • by mrjb (547783) on Friday January 22, 2010 @05:38AM (#30857606)
    The idea is cool and innovative, and makes automating a point-and-click interface a breeze. It certainly has applications.

    But overall, it just seems like a Bad Idea. It will be as reliable as screen-scraping in browsers and would therefore be wise to be avoided, and for the same reasons.

    Even just changing the theme of your OS or the icon sizes could well be enough to confuse the image processing. The code won't be portable, and in the end, for anything but the most simple tasks, the person using it would still require some programming skills. Because of this, I think between Sikuli and command-line scripting, command-line scripting has more staying power.
  • I have to say I am impressed. I have had a play with some of the demos and I like what I see. Whilst I agree that there are limitations this project seems fantastic.

    Having tried and failed to use "win runner" in the past due to the complexity of the GUI application I was testing, this scripting would get past the problems we were having.

    I can envisage sending canned scripts to my folks for doing maintenance on their own machine, even just some diagnostics that I find hard to do over the phone.

    I have a coupl

  • Okay, I have done a fair amount of programming and yet with a new Mac I have not yet dived into the SDKs, etc. I once wanted to do some batch resizing of photos and yet couldn't get it done in Automator easily without being scared of losing the original photos, on my first dive into it. Yes, I actually wrote a great auto-compositing and resizing program once driving the Gimp on linux. It was awesome. But that was years ago and now I have a nice new computer. And where did that code go. Yes I'm sure Automato

FORTRAN is for pipe stress freaks and crystallography weenies.

Working...