Skip to main content

Dynamic Python script loading

I have a bunch of toys and tools in a Git repository - I affectionately call this my toys repo. Most are just scripts that I use from a Unix (Cygwin or Git bash on Windoze) command line but there are some Python classes that I sometimes use in another script. Today at work, I was coming up with a new Python script that made use of a couple of my public classes. The script is good enough to share with my colleagues but I'm faced with the problem of my public classes:

  • I imagine that most of my colleagues haven't even heard of my public classes and I don't expect them to download the entire repo just to get a couple of classes
  • If I'm going to distribute the classes as separate files I introduce new problems:
    • It could be confusing to have the files tag along. What is the user supposed to do with them? The answer is nothing - they should leave them alone and make sure they are in the same directory as the main script in case they decide to move the script or maybe delete these other files they don't recognize.
    • When you have two or more copies of a file, you're wasting space
    • If you have two or more copies of the same script, there's always a possibility that the files are at different versions and that becomes a rat's nest!

I wanted to put my new script into a Git repo and thought I could link to the files in another repo but even that was problematic.

The solution I came up with was the dynamically download the scripts at run-time. The scripts are not saved locally and only used at run-time. The advantages are:

  • There's less confusion over the scripts - there's just the driving script and the user doesn't even know there are other scripts used
  • Users benefit from the latest and greatest versions of the scripts
  • Less space is wasted

External scripts the old way

The way I used to handle these scripts, especially if I'm the only one using them, I would create a symlink to my copy of the file:

$ ln -s ~/bin/foobar.py .
I typically have my toys repo cloned on any system I'm using with ~/bin symbolically linked to ~/toys/bin.

Then I would include this new file in my driving script as if it was a regular file:

from foobar import Foobar
.
.
.
foobar = Foobar(foo, bar)
This is fine when it's just for myself.

Dynamic loading

I've been doing a lot with the Python requests package at work. I also had the luxury of going back over the builtin methods. Seriously, if you haven't done this recently, do yourself a favor and look them over. Sure, some of these methods you use every day and can skip over them but there are others that are more obscure - one of my recent favorites is map() - I would have made much more use of that if I had understood its power.

Basically I combined:

  • requests to extract a copy of a file in the form of a string from a public Git repo
  • exec() to take the string representing the file and execute it

I can eliminate the import statement and can just do something like the following to load the file:

exec(dynaload("pfuntner/toys/master/bin/foobar.py"))
.
.
.
foobar = Foobar(foo, bar)

"pfuntner/toys/master/bin/foobar.py" is a path to a file in my public repository:

  • pfuntner is my Git user
  • toys is the name of my public Git repo
  • master is the name of the main branch of the Git repo
  • bin/foobar.py is the path to a specific script file in my repo. This particular file is bogus but I'm just trying to give you a concrete example

The only thing that's left is to define the dynaload() method and it's not that hard:

"""
import sys
import requests
"""
def dynaload(path):
url = "https://raw.githubusercontent.com/{path}".format(**locals())
req = requests.get(url)
if req.status_code != 200:
sys.stderr.write("Error loading {url}:\n".format(**locals()))
for name in vars(req):
value = getattr(req, name)
sys.stderr.write(" {name}: {value}\n".format(**locals()))
exit(1)
return req.text.replace('"__main__"', '"__static_main__"')
view raw dynaloader.py hosted with ❤ by GitHub

This is a little dense. Let's go through it:

  • Lines 1 through 4 are just to remind you of the packages you'll need to import if you haven't already.
  • Line 7 sets up the URL to the public Git site (which it assumes you're using)
    • The address is a little different than what you might be accustomed to but this is the link Git gives you if you browse a file in raw mode without all the HTML & CSS gorp you see usually. When you are getting a URL for your own file, make sure you get it in raw mode!
    • The **locals() syntax is lazy on my part but I just have to refer to path once in the string. I could have written it as path=path but it's so redundant. You'll see this same technique used a few times in this small method. I suppose the technique is frowned upon by some in the Python community - I couldn't get it past a peer code review for some changes I was making so I relented and played nice. I still think the benefits far outweigh the negatives so I'll continue to use it for my own work where I don't have to get anyone's approval.

    The complete URL should bring up the target file in your favorite browser, from curl, wget, Postman, etc.

  • Line 8 actually pulls down the external script from Git
  • Lines 9 through 14 to handle error conditions when we couldn't get the external file for whatever reason
    • Using vars()
    • on line 11 is also lazy but it makes the code simpler and you don't make assumptions about the contents of the return value.
  • Line 15 is doing string replacement to try to avoid invoking mainline code in the external script. The caller must invoke exec() on the string this method returns and it's probably going to do this at the top-level of the string where a statement like:
    if __name__ == "__main__":
    
    might be used to drive methods and class of the script automatically. I do this myself if I expect the script to ever be used from the command line but when we're loading the classes and method of the script dynamically we probably don't want that code run.

Example

I love full working examples and hate concepts that are totally abstract or involve a lot mysteries. Here's a complete example of using the technique to load one of my files from my public Git repo:

#! /usr/bin/python
import sys
import requests
def dynaload(path):
url = "https://raw.githubusercontent.com/{path}".format(**locals())
req = requests.get(url)
if req.status_code != 200:
sys.stderr.write("Error loading {url}:\n".format(**locals()))
for name in vars(req):
value = getattr(req, name)
sys.stderr.write(" {name}: {value}\n".format(**locals()))
exit(1)
return req.text.replace('"__main__"', '"__static_main__"')
exec(dynaload("pfuntner/toys/master/bin/table.py"))
table = Table(("Column 1", "Column 2"))
table.add(("row 1, cell 1", "row 1, cell 2"))
table.add(("row 2, cell 1", "row 2, cell 2"))
print str(table)

Warnings

  • On at least one Linux system, I've seen these nastygrams when the external script is loaded:
    /opt/foobar/venv/lib/python2.7/site-packages/requests/packages/urllib3/util/ssl_.py:318: SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name Indication) extension to TLS is not available on this platform. This may cause the server to present an incorrect TLS certificate, which can cause validation failures. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#snimissingwarning.
      SNIMissingWarning
    /opt/foobar/venv/lib/python2.7/site-packages/requests/packages/urllib3/util/ssl_.py:122: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
      InsecurePlatformWarning
    
    There script works fine and you can throw stderr away (2>/dev/null). I will point out that I had activated a virtual environment but I'm not sure what that has to do with the issue. No other solution to this yet.

Comments

  1. Similar to what I did here, I also developed a version of the dynaload() method to load a local script - one that's not necessarily in the same directory as the calling script but in some other well known location, such as ~/bin/. It's easy to write a version that just reads that file loads the contents so I don't have to worry about linking the file in the directory, etc.

    ReplyDelete

Post a Comment

Popular posts from this blog

Git-based version information from Python script

I had this idea of generating version information for a Python script that uses ArgParse . The code is a little more than I was expecting but I think it works well. Here is the code: Usage Here is an example of its usage if the script is part of a git repository: $ ./version-example --version b92798, master, 2019-01-18 10:35:02, ['origin:https://github.com/pfuntner/gists.git'] $ It contains: The SHA1 of last git commit that changed the script The current branch of the repository The date of the commit - I think the timezone element is present in this but I didn't want to deal with timezones so I'm ignoring it A list of the remote repositories This is printed on two lines but that's something that ArgParse is doing, not me. Here is an example of its usage if the script is not part of a git repository - we don't have much information to work from but we can at least get the timestamp of the script: $ ~/tmp/version-example --version 2019-01-...