MIT Offers Picture-Centric Programming To the Masses With Sikuli 154
coondoggie writes "Computer users with rudimentary skills will be able to program via screen shots rather than lines of code with a new graphical scripting language called Sikuli that was devised at the Massachusetts Institute of Technology. With a basic understanding of Python, people can write programs that incorporate screen shots of graphical user interface (GUI) elements to automate computer work. One example given by the authors of a paper about Sikuli is a script that notifies a person when his bus is rounding the corner so he can leave in time to catch it."
Here's a video demo of the technology, and a paper explaining the concept (PDF).
FrontPage? (Score:3, Interesting)
But on the upside, dedicated FTE's for "reinstalling corrupted FrontPage extensions" did skyrocket during the FrontPage era.
MMO macro maker? (Score:5, Interesting)
Better (Score:3, Interesting)
Actually I think this is more interesting than either FrontPage or LabView, because it allows you to script GUI apps that were not designed to be scriptable. Even for apps that are scriptable, it provides an increase in user efficiency as you don't have to learn the API commands to do things that you already know how to do in the GUI.
How useful it is will depend on how well the image pattern matching deals with corner cases. Consider you need to click on a text field, however there are many identically looking (empty) text fields, with the only distinguishing factor being the label beside them, and clicking on the label does not select the text field. Like screen scraping, it is also somewhat fragile to UI changes (although not as much as other GUI scripting tools that rely on pixel location).
Right hands great- chances are more harm than good (Score:1, Interesting)
Yea- this might work until the icons change. I don't see this working too well in practice. I don't know about Mac- but on my Ubuntu system the icons got updated last week. And it happens often enough that these scripts would need updating to be a serious pain and expense. It isn't like an ordinary user could figure this stuff out either. Despite it being so simple your still going to need an IT person to create these scripts. Now you just have dumber IT people. Probably people who COST you more money in practice too because they "can" do it- it just the results of their work takes more maintenance. It reminds me of this .bat file written for this video store that backs up a database to a flash drive. If it had only had a statement to check if the flash drive were present and alert the user they wouldn't of wasted $80 calling me to come and find out why the backup program wasn't working. Seriously dumb programmer. In the right hands this kind of thing is good. In the wrong hands it is bad.
Re:Yes, but can Sikuli be used to write Sikuli? (Score:3, Interesting)
Re:Potential (Score:1, Interesting)
Eggplant [testplant.com] says hi.
As a professional test automator, I'd like to point out that automation by image recognition is the method of last resort. The #1 concern in GUI automation is maintainability, and image recognition is the least maintainable method of automation there is short of recording mouse coordinates and keypresses. If you change your theme, if the developer rearranges the controls, if any text is changed, the script is broken. The idea of using image recognition for web page automation is right out. Web sites change way too often for something like this.
The key to writing maintainable scripts is finding and hooking into the property that is least likely to change. If you're automating Windows Forms .NET apps, you might be able to get the actual variable name. If you're automating web pages you could look at the id or name of the control. You can look at the text of a button or the label of a textbox. You find whatever you can that won't change.
On Windows, use AutoIT [autoitscript.com] if you want something free. There's better commercial tools but they start in the hundreds of dollars and only go up from there.
For web automation, look at watir [watir.com], WebDriver/Selenium [google.com], or WatiN [sourceforge.net].
On Macs you get these nice tools called AppleScript and Automator. These are made for end users. They don't use the UI, but instead use an interface made just for automation.
If you can at all avoid it, I recommend not using image recognition tools. They're extremely fragile. That said, sometimes it can't be avoided. I'll probably take a look at the source to see if there's anything I can use in those few cases where image recognition is unavoidable.
Re:Program, NOT code. Think MACRO (Score:1, Interesting)
Don't use a tool like this for testing. Start with AutoIt or nunit+white [codeplex.com], and look at commercial tools if those don't do what you need.
Re:Program, NOT code. Think MACRO (Score:1, Interesting)
Exactly! I'd love to see Sikuli's one new trick integrated into an existing, popular macroing system like AutoIt or AutoHotKey.
Re:Think executable step-by-step tutorials (Score:3, Interesting)
I totally agree. I watched the youtube video (is WTFYV the equivalent of RTFA?), and I was kind of impressed. Although the demo shows an interaction with a bunch of buttons, the real power is the image recognition. She showed how with one command each you can script the two of the fundamental interactions you have with images on the screen: click it, or wait for it to appear. The fuzzy visual recognition algorithms are a huge plus. If you wanted to script something in your room using a web-cam, this is basically how to do it with trivial coding.
I think of this as an equivalent to something like sql. There's a domain in which you'd like to impose logical structure (relational data / images), and you generally use the language to great effect in conjunction with another programming language. If I had to write a scheduled task for my laptop that needed for me to be on the VPN, I'd much rather use something like this to handle the connection rather than trying to figure out how the VPN API works.
-t.
What's so wrong with TurboTax? (Score:2, Interesting)
Some accountants seem to think everyone needs to learn accounting in order to function in society. But people have other jobs. Some of us like our dumbed down tools because they fill a need. My tax software lets me do my taxes without learning "proper" accounting. Similarly, I know some people who benefit greatly from a little passing knowledge of high-level scripting languages like VB, JavaScript, or even Python.
For those kinds of people, Sikuli looks pretty cool because they can do things that would be pretty difficult otherwise. Hey, even for a lot of experienced programmers, capturing a region of the screen and doing fuzzy pattern matching might be a significant task. I haven't tried Sikuli yet, but it looks like it would be very helpful for some things, and a lot easier to deal with than AutoIt or AutoHotkey.
(BTW, TurboTax was just an example. I actually use something I like better, but you get the idea.)
Re:Potential (Score:2, Interesting)
I am currently working on automated GUI tests for an application, and Sikuli looks pretty great -- even when compared to enterprise level automated GUI testing tools costing in the order of thousands of dollars per user licence.
Some of the commenting below on maintainability problems seem pretty superficial. For example, to ease maintainability you could build a framework abstracting GUI component images from regression test scripts. For example, you could assign a screenshot as a variable and then refer to that variable throughout your test, so if a button happens to change dramatically, you make the change in potentially one place in your code instead of every time it is used in a click. The fact that the tool appears simple (not too many bells and whistles) and is based on Python seems to be major advantages for maintainability.
Check out this interesting academic paper which specifically addresses using Sikuli for automated GUI testing: "GUI Testing Using Computer Vision, CHI 2010" at http://sikuli.csail.mit.edu/documentation.shtml [mit.edu]
Has anybody actually used Sikuli? I'd be very curious if anybody has used this for automated GUI testing in a corporate environment...
Cool, but it has severe downsides. (Score:3, Interesting)
But overall, it just seems like a Bad Idea. It will be as reliable as screen-scraping in browsers and would therefore be wise to be avoided, and for the same reasons.
Even just changing the theme of your OS or the icon sizes could well be enough to confuse the image processing. The code won't be portable, and in the end, for anything but the most simple tasks, the person using it would still require some programming skills. Because of this, I think between Sikuli and command-line scripting, command-line scripting has more staying power.