Become a fan of Slashdot on Facebook


Forgot your password?
GUI Programming Software

MIT Offers Picture-Centric Programming To the Masses With Sikuli 154

coondoggie writes "Computer users with rudimentary skills will be able to program via screen shots rather than lines of code with a new graphical scripting language called Sikuli that was devised at the Massachusetts Institute of Technology. With a basic understanding of Python, people can write programs that incorporate screen shots of graphical user interface (GUI) elements to automate computer work. One example given by the authors of a paper about Sikuli is a script that notifies a person when his bus is rounding the corner so he can leave in time to catch it." Here's a video demo of the technology, and a paper explaining the concept (PDF).
This discussion has been archived. No new comments can be posted.

MIT Offers Picture-Centric Programming To the Masses With Sikuli

Comments Filter:
  • FrontPage? (Score:3, Interesting)

    by Itninja ( 937614 ) on Thursday January 21, 2010 @05:25PM (#30851710) Homepage
    Sounds like the Microsoft FrontPage of coding software. Why do with text what you can do with pictures? And we all know FrontPge went on to become the defacto standard for web development....that had to be fixed by an real web developer later.

    But on the upside, dedicated FTE's for "reinstalling corrupted FrontPage extensions" did skyrocket during the FrontPage era.
  • MMO macro maker? (Score:5, Interesting)

    by visgoth ( 613861 ) on Thursday January 21, 2010 @05:32PM (#30851842)
    This looks like a powerful tool for gold / isk / whatever farming. I'm tempted to resurrect my eve account and see if I can make an auto-miner script.
  • Better (Score:3, Interesting)

    by pavon ( 30274 ) on Thursday January 21, 2010 @05:35PM (#30851920)

    Actually I think this is more interesting than either FrontPage or LabView, because it allows you to script GUI apps that were not designed to be scriptable. Even for apps that are scriptable, it provides an increase in user efficiency as you don't have to learn the API commands to do things that you already know how to do in the GUI.

    How useful it is will depend on how well the image pattern matching deals with corner cases. Consider you need to click on a text field, however there are many identically looking (empty) text fields, with the only distinguishing factor being the label beside them, and clicking on the label does not select the text field. Like screen scraping, it is also somewhat fragile to UI changes (although not as much as other GUI scripting tools that rely on pixel location).

  • by Anonymous Coward on Thursday January 21, 2010 @05:49PM (#30852190)

    Yea- this might work until the icons change. I don't see this working too well in practice. I don't know about Mac- but on my Ubuntu system the icons got updated last week. And it happens often enough that these scripts would need updating to be a serious pain and expense. It isn't like an ordinary user could figure this stuff out either. Despite it being so simple your still going to need an IT person to create these scripts. Now you just have dumber IT people. Probably people who COST you more money in practice too because they "can" do it- it just the results of their work takes more maintenance. It reminds me of this .bat file written for this video store that backs up a database to a flash drive. If it had only had a statement to check if the flash drive were present and alert the user they wouldn't of wasted $80 calling me to come and find out why the backup program wasn't working. Seriously dumb programmer. In the right hands this kind of thing is good. In the wrong hands it is bad.

  • by Seor Jojoba ( 519752 ) on Thursday January 21, 2010 @06:43PM (#30853158) Homepage
    Yes, you could use Sikuli to fire up a text editor, individually press the keys to write all the lines of code, launch the compiler/linker/whatever. So it meets your weird definition of completeness. However, I suspect you could not use Sikuli to write a program that writes a Sikuli program to write Sikuli. I could be wrong, though.
  • Re:Potential (Score:1, Interesting)

    by Anonymous Coward on Thursday January 21, 2010 @06:57PM (#30853390)

    Eggplant [] says hi.

    As a professional test automator, I'd like to point out that automation by image recognition is the method of last resort. The #1 concern in GUI automation is maintainability, and image recognition is the least maintainable method of automation there is short of recording mouse coordinates and keypresses. If you change your theme, if the developer rearranges the controls, if any text is changed, the script is broken. The idea of using image recognition for web page automation is right out. Web sites change way too often for something like this.

    The key to writing maintainable scripts is finding and hooking into the property that is least likely to change. If you're automating Windows Forms .NET apps, you might be able to get the actual variable name. If you're automating web pages you could look at the id or name of the control. You can look at the text of a button or the label of a textbox. You find whatever you can that won't change.

    On Windows, use AutoIT [] if you want something free. There's better commercial tools but they start in the hundreds of dollars and only go up from there.

    For web automation, look at watir [], WebDriver/Selenium [], or WatiN [].

    On Macs you get these nice tools called AppleScript and Automator. These are made for end users. They don't use the UI, but instead use an interface made just for automation.

    If you can at all avoid it, I recommend not using image recognition tools. They're extremely fragile. That said, sometimes it can't be avoided. I'll probably take a look at the source to see if there's anything I can use in those few cases where image recognition is unavoidable.

  • by Anonymous Coward on Thursday January 21, 2010 @07:52PM (#30854196)

    Don't use a tool like this for testing. Start with AutoIt or nunit+white [], and look at commercial tools if those don't do what you need.

  • by Anonymous Coward on Thursday January 21, 2010 @08:48PM (#30854796)

    Exactly! I'd love to see Sikuli's one new trick integrated into an existing, popular macroing system like AutoIt or AutoHotKey.

  • by tristanreid ( 182859 ) on Thursday January 21, 2010 @09:25PM (#30855082)

    I totally agree. I watched the youtube video (is WTFYV the equivalent of RTFA?), and I was kind of impressed. Although the demo shows an interaction with a bunch of buttons, the real power is the image recognition. She showed how with one command each you can script the two of the fundamental interactions you have with images on the screen: click it, or wait for it to appear. The fuzzy visual recognition algorithms are a huge plus. If you wanted to script something in your room using a web-cam, this is basically how to do it with trivial coding.

    I think of this as an equivalent to something like sql. There's a domain in which you'd like to impose logical structure (relational data / images), and you generally use the language to great effect in conjunction with another programming language. If I had to write a scheduled task for my laptop that needed for me to be on the VPN, I'd much rather use something like this to handle the connection rather than trying to figure out how the VPN API works.


  • by AardvarkCelery ( 600124 ) on Thursday January 21, 2010 @09:50PM (#30855292)

    Some accountants seem to think everyone needs to learn accounting in order to function in society. But people have other jobs. Some of us like our dumbed down tools because they fill a need. My tax software lets me do my taxes without learning "proper" accounting. Similarly, I know some people who benefit greatly from a little passing knowledge of high-level scripting languages like VB, JavaScript, or even Python.

    For those kinds of people, Sikuli looks pretty cool because they can do things that would be pretty difficult otherwise. Hey, even for a lot of experienced programmers, capturing a region of the screen and doing fuzzy pattern matching might be a significant task. I haven't tried Sikuli yet, but it looks like it would be very helpful for some things, and a lot easier to deal with than AutoIt or AutoHotkey.

    (BTW, TurboTax was just an example. I actually use something I like better, but you get the idea.)

  • Re:Potential (Score:2, Interesting)

    by jdimatteo ( 1726948 ) on Friday January 22, 2010 @12:08AM (#30856196)

    I am currently working on automated GUI tests for an application, and Sikuli looks pretty great -- even when compared to enterprise level automated GUI testing tools costing in the order of thousands of dollars per user licence.

    Some of the commenting below on maintainability problems seem pretty superficial. For example, to ease maintainability you could build a framework abstracting GUI component images from regression test scripts. For example, you could assign a screenshot as a variable and then refer to that variable throughout your test, so if a button happens to change dramatically, you make the change in potentially one place in your code instead of every time it is used in a click. The fact that the tool appears simple (not too many bells and whistles) and is based on Python seems to be major advantages for maintainability.

    Check out this interesting academic paper which specifically addresses using Sikuli for automated GUI testing: "GUI Testing Using Computer Vision, CHI 2010" at []

    Has anybody actually used Sikuli? I'd be very curious if anybody has used this for automated GUI testing in a corporate environment...

  • by mrjb ( 547783 ) on Friday January 22, 2010 @05:38AM (#30857606)
    The idea is cool and innovative, and makes automating a point-and-click interface a breeze. It certainly has applications.

    But overall, it just seems like a Bad Idea. It will be as reliable as screen-scraping in browsers and would therefore be wise to be avoided, and for the same reasons.

    Even just changing the theme of your OS or the icon sizes could well be enough to confuse the image processing. The code won't be portable, and in the end, for anything but the most simple tasks, the person using it would still require some programming skills. Because of this, I think between Sikuli and command-line scripting, command-line scripting has more staying power.

"For a male and female to live continuously together is... biologically speaking, an extremely unnatural condition." -- Robert Briffault