MIT Offers Picture-Centric Programming To the Masses With Sikuli 154
coondoggie writes "Computer users with rudimentary skills will be able to program via screen shots rather than lines of code with a new graphical scripting language called Sikuli that was devised at the Massachusetts Institute of Technology. With a basic understanding of Python, people can write programs that incorporate screen shots of graphical user interface (GUI) elements to automate computer work. One example given by the authors of a paper about Sikuli is a script that notifies a person when his bus is rounding the corner so he can leave in time to catch it."
Here's a video demo of the technology, and a paper explaining the concept (PDF).
FrontPage? (Score:3, Interesting)
But on the upside, dedicated FTE's for "reinstalling corrupted FrontPage extensions" did skyrocket during the FrontPage era.
Better (Score:3, Interesting)
Actually I think this is more interesting than either FrontPage or LabView, because it allows you to script GUI apps that were not designed to be scriptable. Even for apps that are scriptable, it provides an increase in user efficiency as you don't have to learn the API commands to do things that you already know how to do in the GUI.
How useful it is will depend on how well the image pattern matching deals with corner cases. Consider you need to click on a text field, however there are many identically loo
Re: (Score:2)
I can think of at least 3 ways of doing (scripting gui apps that aren't scriptable) already that have been around for years.
Re: (Score:2)
Your post would be better if you named the three you're thinking of. I wonder if they're the same ones I'm thinking of..
Re: (Score:2)
no, Logo was simple programming to make pictures. The turtle was a drawing point, whether on screen or paper.
Re:FrontPage? (Score:5, Informative)
>And we all know FrontPge went on to become the defacto standard for web development....that had to be fixed by an real web developer later.
Do you want to democratize technology or just have it controlled by elites? Non-techies want to do things like scripting and web design without paying a professional, the same way they want to fix things around the house or fix the car. When it comes to small or easy jobs, a non-expert can do just fine. Why should we piss on the DIY'ers because they dont have a Master's degree in CS? Frankly, a lot of computer stuff is pretty easy and paying someone is ridiculous.
While Im certainly no fan of Frontpage, I feel that it wasnt much worse than Mozilla Composer or other WSIWYG html composers.
Re: (Score:2, Insightful)
>Do you want to democratize technology or just have it controlled by elites?
Neither. I'd like to see people who wish to program, learn how to.
Re: (Score:3, Insightful)
Re: (Score:2)
Re: (Score:2)
>Ever used a brownie mix
Yes, but I didn't consider myself a chef afterwards
Re: (Score:2)
The problem with FrontPage wasn't the users, it was the code that it produced.
Re: (Score:2)
Re:FrontPage? (Score:5, Insightful)
Then we end up with Visual Basic(the birth of Visual Basic came with the motto "its so easy you know longer need programmers... managers can write the code"... that worked out well).
From a business point of view, it actually did. People used VB, and particularly VB macros in Office, to do things that resulted in a lot of dollars flowing through a lot of organizations. Yes it did eventually need to be changed out, but in it's time, for it's purpose, you can't really fault it. It truly did work.
Re: (Score:2)
Imo, VB wasn't actually that bad in the hands of people that took a real effort to learn how to program but in the hands of most it was a freaking nightmare and caused loads of problems.
DIY programming is fine for personal use but it should never be used for businesses.
Re: (Score:2)
Security issue, yes. Loads of problems, yes.
Doesn't obviate what I said. In the moment lots of dollars changed hands because of it. That, among all else, is what a business would use to determine 'success'.
Re: (Score:2)
That's a false assumption. VB = Microsoft, and more importantly, FrontPage = Microsoft.
Re: (Score:2)
Why should we piss on the DIY'ers because they dont have a Master's degree in CS? Frankly, a lot of computer stuff is pretty easy and paying someone is ridiculous.
Thousands of cars on cinderblocks and dozens of houses with flooded basements are testimony that sometimes, paying someone is the only thing that isn't ridiculous. There's DIY, and there's "OMG you are SO in over your head." Anyone whose software development abilities are so stunted that the "advancement" outlined in TFA would help them is absol
Re: (Score:1)
There is a minimum level of skill and talent required to do anything. The only thing that happens when you make something "so simple anyone can do it", is a minefield of crap software. Instructing a computer to do something requires the ability to think abstractly, and organize/plan with an orders of magnitude more sophistication than "Do I want eggs or pancakes for breakfast?". Arguing that the 'elites' are pushing down the 'DIYers' is disingenuous. A real DIYer will overcome the learning curve of what
Re: (Score:2)
Do you want to democratize technology or just have it controlled by elites?
If I have to be in charge of cleaning up their mess then I would prefer that they leave development to the professionals. I think that is what the parent is getting at. We professionals are tired of rescuing dabblers who get in over their heads because their "easy to use" tools are just powerful enough to get them into trouble, but not powerful enough to get them out. If people agree to be responsible for their own results, good or bad, then I say let them do as they wish. Unfortunately, it never seems to w
Re: (Score:2)
Sikuli is good though it would appear it takes control of your computer so really it's pretty useless aside from dong batch repetitive jobs. You can probably get around that by learning more Python but the demo didn't interest me but I'm sure loads of people will love it for automating tasks.
Front-page on the other hand was awful and produced loads of awful sites that unfortunately affect more peop
Re: (Score:2, Insightful)
Just because code is text (and can be readily generated) doesn't mean that anyone with notepad should be able to write it.
Wrong. It means exactly that. BASIC was a language meant to make it easier for people to program, and it was THE introduction to programming for many people. Now, am I arguing that BASIC is the right language for any given task? no. I haven't coded in BASIC for 15 years. But making the barrier to entry easier can only be a good thing for attracting new blood.
Re: (Score:2)
That was my first thought as well. I programmed in HP VEE [agilent.com] and Labview [ni.com] in the early nineties.
Potential (Score:2, Insightful)
Especially for Testing your GUI.
This seems like AutoIT but with image recognition (instead of having to input mouse coordinates).
Re: (Score:2)
Watching the YouTube demo, I immediately thought of how basic this is compared to AutoIT's functions, and even the quick record function is faster to "program" with than this screenshot function.
It says it can tolerate some changes, but what if there is a completely different visual theme installed? What if a drop down is not on the same item it was when you made the script? AutoIT can take care of this by reading the underlying GUI code to allow for these kind of things. As someone who has been automating
Re: (Score:2)
How about if the stuff you click on might look rather different each time? e.g. the IP address might not be 0.0.0.0 but something else the second or third time around.
And what if the stuff you need to click on can only be identified by text or an icon that you don't click on - you actually click on the stuff to the right (or left or whatever) of it. This one isn't a biggie - it shouldn't be too difficult to
Re: (Score:2)
The MS test crap in the latest versions of VisualStudio do it as well, and they'll be happy to find a button (if its a standard control) to click on using other data rather than mouse coordinates as well.
Re: (Score:2)
>This seems like AutoIT but with image recognition (instead of having to input mouse coordinates).
Right, its AutoHotKey/AutoIT with a nicer OCR library. Perhaps this will light the fire under the butts of the AutoHotKey devs and add in some smarter screen reading and browser integration.
Re: (Score:1, Interesting)
Eggplant [testplant.com] says hi.
As a professional test automator, I'd like to point out that automation by image recognition is the method of last resort. The #1 concern in GUI automation is maintainability, and image recognition is the least maintainable method of automation there is short of recording mouse coordinates and keypresses. If you change your theme, if the developer rearranges the controls, if any text is changed, the script is broken. The idea of using image recognition for web page automation is right out.
Re: (Score:2, Interesting)
I am currently working on automated GUI tests for an application, and Sikuli looks pretty great -- even when compared to enterprise level automated GUI testing tools costing in the order of thousands of dollars per user licence.
Some of the commenting below on maintainability problems seem pretty superficial. For example, to ease maintainability you could build a framework abstracting GUI component images from regression test scripts. For example, you could assign a screenshot as a variable and then refer
Re: (Score:2)
I just tried it out for an hour or so with our web application, and it seems to be doing it's job. One thing that it didn't manage to do is click somewhere relative to the matched image. It always seems to click in the middle of the image, which is annoying when you want it to click one checkbox out of many based on it's preceding label.
Perhaps it's possible to use some kind of nesting, so you could try to find the image of the checkbox inside a previous match that includes the label, but I didn't find out
Re: (Score:2)
Yes, but that kind of defeats the purpose of using image recognition so you don't have to care about the exact layout of your application. Inserting a new control on the page could break the test if you used tabs
Re: (Score:2)
I was thinking the same thing. Where this would be really handy would be in applications that paint their own windows and don't expose the gui handles for AutoIt to latch on to. Specifically, this would work great for Great Plains or online poker clients. :)
-ellie
MMO macro maker? (Score:5, Interesting)
Re: (Score:1)
- selecting and clicking on see-through buttons (the background will change too much)
- the program access to the actual game for seeing, clicking and typing
- the game's anti-hack detection / counter-measures
- macro playing lag (see video)
But it seems very promising nevertheless.
Re: (Score:2)
Add in the number of pilots who even if they're anti-pirate operate a KOS policy when it comes to macro miners....
Re: (Score:1, Offtopic)
Re: (Score:2, Offtopic)
Depending on how robust sikuli is, it might be possible to make a mission running macro, which could be even safer than blasting rocks (with the right ship setup, and such). Barring that I'd likely use sikuli on a secon
My grandmother knows python (Score:5, Insightful)
"Computer users with rudimentary skills"..... "with a basic understanding of Python"?
Re: (Score:1)
You're reading a story about MIT on slashdot.
Two groups that are so utterly disconnected from the real world that they both have no idea why their favorite toy hasn't taken over the world even those its the simplest, most efficient, easiest to use, most feature rich (insert whatever here) on the planet.
Most of both groups probably think grandma knows assembly as well.
Re:My grandmother knows python (Score:5, Funny)
"Computer users with rudimentary skills"..... "with a basic understanding of Python"?
Computer users with a rudimentary skill who do not have a basic understanding of Python can always build a Python programming AI in Lisp (or at least that's what I gathered from the MIT docs I browsed) and thus save themselves the trouble.
Re: (Score:2)
Moar liek BASIC understanding of a python.
Re: (Score:3, Informative)
Re: (Score:2)
Python is arguably one of the easiest languages to learn.
I can't wait to explain to my mom the difference between four spaces and one tab, just to name one of Python's endless oddities.
The Cow pat model (Score:5, Funny)
For years I have been asking for a softwsare development tool that allows me to write PHP code by throwing cow-pats at the screem with the Wiimote.
And my colleagues wat a tool that allows dispatching my bugs with the Wii gun attachment they use in "Quantum of Solace".
High? (Score:1)
FTFA: "Sikuli -- which means God's eye in the language of the Huichol Indians in Mexico". Mexican Indians love their hallucinogenic Peyote [wikipedia.org]. On the other hand, MIT researchers want the masses to program with the mouse. Well, I know about "correlation is not causation", but MIT sure is an interesting place to be.
Right hands great- chances are more harm than good (Score:1, Interesting)
Yea- this might work until the icons change. I don't see this working too well in practice. I don't know about Mac- but on my Ubuntu system the icons got updated last week. And it happens often enough that these scripts would need updating to be a serious pain and expense. It isn't like an ordinary user could figure this stuff out either. Despite it being so simple your still going to need an IT person to create these scripts. Now you just have dumber IT people. Probably people who COST you more money in p
Program, NOT code. Think MACRO (Score:4, Insightful)
From what I seen is this a macro program that can use screenshots rather then key/mouse data to automate tasks. So you PROGRAM your PC in the same way you PROGRAM a VCR to record a show. It is NOT the same as writing an application.
But it seems very intresting once you got past this difference. Macro's are very handy for testing in my experience but often have a problem because a tiny mis-alignment can ruin it all. If this program is smarter because it can regonize where data is supposed to go... well that would certainly make automated tests a bit easier.
Interesting stuff. Just don't think you will be writing software with this.
Re: (Score:2)
Interesting stuff. Just don't think you will be writing software with this.
Since a few years, programming has become equivalent to placing Lego bricks in the correct order (I'm working with Microsoft .NET and tons of components).
So I'm not very surprised by the approach, as long as we can find all the possible varieties of pieces.
Re: (Score:1, Interesting)
Don't use a tool like this for testing. Start with AutoIt or nunit+white [codeplex.com], and look at commercial tools if those don't do what you need.
Re: (Score:1, Interesting)
Exactly! I'd love to see Sikuli's one new trick integrated into an existing, popular macroing system like AutoIt or AutoHotKey.
bad VB flashbacks (Score:2)
Re: (Score:2)
I'm suddenly reminded of horrible apps written in VB97
You're 93 versions ahead of your time - VB6 was the last version of Visual Basic before .NET.
Perhaps more to the point, this not only targets a completely different purpose than Visual Basic, but also looks nothing like it whatsoever.
Re: (Score:2)
There was still no VB97. Nice try though.
Re: (Score:2)
Er, damn - you got me there.
I still don't have a clue how a scripting language with image recognition reminds you of VB though.
Re: (Score:2)
That's OK. For most VB apps there wasn't any "back end".
Yes, but can Sikuli be used to write Sikuli? (Score:2, Funny)
Re: (Score:3, Interesting)
Perfect Macro program... (Score:1)
I understand that it has to visually find the button and this is computationally expensive, but the 2~3 seconds lag didn't seem compatible with the task.
On a sidenote, the video states that there's no "internal API" dependence, but it clearly has to send "click" and "type" signals. Is that really OS independent or was it just an overstatement?
Re: (Score:2)
the video states that there's no "internal API" dependence
I suspect they were referring to internal API of the program being controlled. ie COM, Corba, etc...
lame (Score:2, Insightful)
Re: (Score:1)
Re: (Score:2)
So, you tried it and it didn't work?
Re: (Score:1)
Re: (Score:2)
I didn't RTFA, but basing this stuff on the *accessibility* view of the screen is/can be useful.
Applescript was invented a LONG time ago people... (Score:1)
It can script GUI actions in much the same way. Granted it's not a very nice environment for more complicated work, but still.
Re: (Score:2)
The last time I tried to use Applescript on windows or linux, it wouldn't even start up.
Its a brilliant idea. (Score:3, Insightful)
Re: (Score:2)
Couldn't agree with you more. I'm surprised by all the negativity. And it seems to me this is innovative enough to have uses that no one here is thinking about right now.
Problems (Score:2)
The script may not work if the UI style is different from the one recorded or if the UI language is different from the one recorded. Generally, any option that can change the UI from computer to computer will create a problem for Sikuli.
Re: (Score:1)
It's even worse than that... Just change your icon or window border theme and watch every Sikuli script break.
The great thing about all other languages except Sikuli is: When you change your Icon or window border theme the programs still run.
Again!?! That trick never works. (Score:1, Insightful)
The Sikuli School of Programming (Score:3, Funny)
loop
talkTo (self, "Don't program!")
Look (@ Pretty pictures)
endloop
endif
It's Not Going Anywhere (Score:1)
Think executable step-by-step tutorials (Score:5, Insightful)
Sikuli is certainly not commercial-grade UI testing software. It was never intended to be, this is academic software written to explore ideas, rather than to polish them to perfection. Also, it is not a "general" programming language. The previous posters that compared it to video-programming are right: not all programs have to target complicated algorithms and data-structures, there is plenty of space for automating "simple stuff".
As an idea, I find the readability of the code particularly interesting. Sikuli code is about the closest you can come to self-explanatory, step-by-step instructions on how to achieve whatever a particular program does. Add a few comments to the most arcane steps, publish those programs to an online repository, and presto! executable step-by-step tutorials.
Yes, the developers may have to address the variability of themes on people's desktops. It is certainly possible to do so (for instance, by keeping a list of mappings from any of a set of "supported" themes to a "canonical" theme, which would be used in all examples), but, as far as ideas go, I really think that Sikuli is a very refreshing idea.
Re: (Score:3, Interesting)
I totally agree. I watched the youtube video (is WTFYV the equivalent of RTFA?), and I was kind of impressed. Although the demo shows an interaction with a bunch of buttons, the real power is the image recognition. She showed how with one command each you can script the two of the fundamental interactions you have with images on the screen: click it, or wait for it to appear. The fuzzy visual recognition algorithms are a huge plus. If you wanted to script something in your room using a web-cam, this i
Use This for Software Testing, and Scripting? (Score:2)
Re: (Score:1)
What's so wrong with TurboTax? (Score:2, Interesting)
Some accountants seem to think everyone needs to learn accounting in order to function in society. But people have other jobs. Some of us like our dumbed down tools because they fill a need. My tax software lets me do my taxes without learning "proper" accounting. Similarly, I know some people who benefit greatly from a little passing knowledge of high-level scripting languages like VB, JavaScript, or even Python.
For those kinds of people, Sikuli looks pretty cool because they can do things that would b
SendKeys (Score:2)
Wow they just created the old VB SendKeys command. I was actually doing stuff like this 12-14 years ago with SendKeys command in VB. In "practical" use back then
it sucked and I am certain that has not changed.
AutoIt (Score:2)
I did this exact same thing in AutoIt [autoitscript.com], except that it needs exact matches of images instead of a fuzzy recognizer. (Plus, I also had rule triggers and state vs just a single list of imperative commands)
The fuzzy match is a nice addition, but this automation concept has been available for years.
Re: (Score:2)
and probably an obligatory one as well. If the screenshot is a (lossy) jpeg, the image recognition simply won't work unless it is at least somewhat fault-tolerant.
Re: (Score:2)
What AutoIt does is take a hash of the pixels in a rectangular area. If you interactively capture an area's hash when the screen is in the desired state, then that area can be scanned during the script run to see when/if it matches the desired hash again. The area's location can be relative to a window, control, screen, etc, and the software can scan around various locations in case it moved.
There's no lossiness in any of the image manipulation, but the same pixels need to show up.
Better Solution one line (Score:2)
man ifconfig
Spammers Rejoice! (Score:1)
Just Great... all the spammers need now is a few CAPTCHA deciphering Sikuli plug ins.
Once that's done we can all go back to manually removing spam from our web forums and in-boxes.
Bobby Tables (Score:2)
I think I've seen this before... (Score:2)
Cool, but it has severe downsides. (Score:3, Interesting)
But overall, it just seems like a Bad Idea. It will be as reliable as screen-scraping in browsers and would therefore be wise to be avoided, and for the same reasons.
Even just changing the theme of your OS or the icon sizes could well be enough to confuse the image processing. The code won't be portable, and in the end, for anything but the most simple tasks, the person using it would still require some programming skills. Because of this, I think between Sikuli and command-line scripting, command-line scripting has more staying power.
Associative Arrays indexed by a Freakin' Image (Score:2)
I have to say I am impressed. I have had a play with some of the demos and I like what I see. Whilst I agree that there are limitations this project seems fantastic.
Having tried and failed to use "win runner" in the past due to the complexity of the GUI application I was testing, this scripting would get past the problems we were having.
I can envisage sending canned scripts to my folks for doing maintenance on their own machine, even just some diagnostics that I find hard to do over the phone.
I have a coupl
Immediately useful, valuable and fun (Score:2)
Okay, I have done a fair amount of programming and yet with a new Mac I have not yet dived into the SDKs, etc. I once wanted to do some batch resizing of photos and yet couldn't get it done in Automator easily without being scared of losing the original photos, on my first dive into it. Yes, I actually wrote a great auto-compositing and resizing program once driving the Gimp on linux. It was awesome. But that was years ago and now I have a nice new computer. And where did that code go. Yes I'm sure Automato
MIT can't afford real microphones (Score:2)
The subtitles were a bit of a surprise. Can MIT not afford better than built in microphones on cheap laptops? Between her vaugely asian accent, the poor quality of the audio (seriously, you're TELLING people how to do something, the audio is important here - did they record this in a shower stall or something? my netbook's audio sounds 100x better than this), and then apparently some sort of wacky audio encoding basically makes her impossible to understand. People who speak english as a second language aren
Re: (Score:2)
On the contrary, my experience has been that non-native speakers of English are actually better at understanding other non-native speakers. I don't know why that is, but intuitively it makes sense -- non-native speakers probably learned from a diversity of other non-native speakers.
I was at a WinHEC panel session in 2008 and the panel leader had absolutely horrible English (I'm sure he was intelligent, but he wasn't intelligible). Somebody else, clearly of another racial background (the specific ethnicities
Re: (Score:2)
That's because non-native speakers can't string the words together, they have to cut them up individually. If that makes any sense.
Re: (Score:2)
That gibberish they spoke was cityspeak; gutter talk. A mishmash of Japanese, Spanish, German, what have you.
I didn't really need a translator--I knew the lingo, every good engineer did...but I wasn't going to make it easier for them.
Re: (Score:2)
Wow, no one has watched the movie Swordfish [imdb.com] have they?
Re: (Score:2, Funny)
We're trying to repress those memories, you insensitive clod!
Re: (Score:1)
Have you seen his wife recently?
Re: (Score:2)
There are far easier ways to commit click fraud than actually looking at the screen to do it. The ad companies tend to ignore the same request multiple times from the same IP so this changes nothing.
People who commit 'click fraud' aren't writing crappy little screen scrapers to do it, its far easier and faster to write a plugin for firefox to do what you're say and just find the text of your ad on the page and trigger the link. No need to futz with whats displayed or 'moving the mouse' to the right spot,
Re: (Score:2)
Sorry, there are some things even Sikuli can't process.
Re: (Score:2)
Re: (Score:2)
I mostly agree with you, it's always silly to automate a sequence of GUI actions.
However I can see where they're going here; the program examines your screen and finds the widget to click on or enter data into, much like a human looking at the screen and deciding what to do next. Extend that to the real world, a robot that looks around your room for the remote control and turns on the TV, then surfs through the channels until it recognizes something you like to watch. By then it will also be capable of unde