Recently I’ve been considering developing an artificial intelligence to play some of the games in my steam library. Most games, unlike StarCraft 2, lack an API for developers to hook into the game to observe the environment and make decisions. Therefore, more primitive methods have to be used for these two operations. In this post, I discuss the former, i.e. gathering data from an application, which, in my case, can be used to train machine learning models however I’m sure there are many other uses for this functionality.
Dependencies
The Python programming language will be used to capture this data. In particular, Python 3.7 is used but almost any version of Python 3 should be fine.
Additional libraries include numpy, cv2, PIL, win32gui, win32com.client, and time. One thing to note with these dependencies is the use of win32* libraries. Clearly these libraries are only compatible with Windows. I’m sure there are ways to port this logic to work on Linux and Mac OS but these allow us to grab a window by its title. One alternative is to just pass a BBox (more below) into the ImageGrab.grab() function. This removes the win32* dependencies but requires the window we want to capture to be in a particular place on the screen depending on the box specified (e.g. the upper left corner would be something like bbox=(0, 0, screen_width, screen_height)).
The dependencies can be installed via
pip install opencv-python numpy win32gui pywin32 Pillow
Streaming Data
To stream data from an open window on Windows, we can use the win32 libraries to determine which windows are currently open and select the window handle for the application of interest; I’m doing this by using the window’s title. I’m using two functions to implement the window handle finding functionality
def enum_cb(hwnd, results): winlist.append((hwnd, win32gui.GetWindowText(hwnd))) def get_screens(screen_name): # wait for the program to start initially. win32gui.EnumWindows(enum_cb, winlist) screens = [(hwnd, title) for hwnd, title in winlist if screen_name in title.lower()] while len(screens) == 0: screens = [(hwnd, title) for hwnd, title in winlist if screen_name in title.lower()] win32gui.EnumWindows(enum_cb, winlist) return screens
The first function, enum_cb(), is a callback function used to enumerate the currently open windows. This callback is used by the win32gui.EnumWindows() call to fill a list with the open windows, i.e. winlist. The screens list is then populated by looking through all of the open windows and trying to find our window of interest by comparing the title of the windows with our screen’s name. Each window with a matching title is returned from the get_screens() function via the screens list. The function also waits until the window is open before returning. This is useful for things like sending commands to the application but may not be ideal for your application since it will just run forever if a window with a matching title is never opened.
The next logic is the main functionality of the program which gets the screen we’re interested in and, using opencv and numpy, displays the data in a separate screen.
if __name__ == '__main__': winlist = [] screen = 'apex legends' sfd = get_screens(screen) last_time = time.time() cont = True while cont: if len(get_screens(screen)) <= 0: # check if closed cont = False continue window = sfd[0][0] try: print_screen = np.array(ImageGrab.grab(bbox=win32gui.GetWindowRect(window))) print("loop took {} seconds".format(time.time() - last_time)) last_time = time.time() cv2.imshow('window',cv2.cvtColor(print_screen, cv2.COLOR_BGR2RGB)) if cv2.waitKey(25) & 0xFF == ord('q'): cv2.destroyAllWindows() break except Exception as e: print("error", e)
This logic is pretty straightforward. To begin the screen of interest is found. Then, indefinitely, the script checks if the window is still open (if not we exit), and, if it is, the PIL library is used to grab a screenshot of the window. This is done by using the win32gui.GetWindowRect(window) function. This function essentially returns the bounding box necessary to use the PIL library for taking screenshots. The image is stored as a numpy array which can then be used by opencv to display the current snapshot of data for us. Since this is done over and over, the data is essentially a stream of images. The logic above can stream the data at approx. 10 frames per second.
Screen Capturing
Having the logic above, converting this to screen capturing is very simple. The Python logic above is essentially taking and processing a screenshot as soon as possible and displaying it to the user. Swapping out the OpenCV display logic with some image saving logic results in a collection of images rather than a constant stream of data. Adding a sleep call can help control how often screenshots are taken and saved.
if __name__ == '__main__': winlist = [] screen = 'sigma-finite dungeon' sfd = get_screens(screen) i = 0 cont = True while cont: if len(get_screens(screen)) <= 0: cont = False print("Saved " + str(i+1) + " images...") continue hwnd = sfd[0][0] try: win32gui.SetForegroundWindow(hwnd) bbox = win32gui.GetWindowRect(hwnd) img = ImageGrab.grab(bbox) img.save('images/'+screen+str(trial)+'_'+str(i)+'.png') i += 1 except: print("There was an error...") time.sleep(5) winlist = []
There are some minor differences here. First, we have to use the win32gui library to set the window of interest as the window in the foreground. If this is not done you might end up getting screenshots of stuff you’re not interested in. Second, a sleep of 5 seconds is added at the end of the loop. This sleep causes a delay of 5 seconds between screenshots since, in this particular case, I didn’t need to have a constant stream of data.
Conclusion
Above, I’ve provided the logic necessary to save and/or stream data from an application running on a Windows machine. In future blog posts, I plan to share applications using data collected via these methods as a means to train artificial intelligences to play various games. However, these applications are in no way specific to this type of project and can easily be repurposed for your needs. In the case of game playing these two functionalities can both be valuable since the screenshot logic can be used to compile a database of images to train Neural Networks and the data streaming logic can be used to feed data into the Neural Networks and make real-time decisions once trained.