I’ve been tinkering with deepseek-r1
for a couple of weeks now, but with its newfound popularity and cries of impending doom for developers, I thought I’d do a little bit more systematic testing with a trivial use case and post the results.
Update: I now have access to
deepseek-r1
on Azure (which is the full model) and have been replicating this there–some people missed the fact that I was already usingchat.deepseek.com
for most of the testing, but the gist of things is that the “full” model is no better at this than the distilled one–it was slightly smarter at doing steps 1 and 5, but the overall code quality was not significantly better. I’ve also established that order really matters, since the model will sometimes “forget” things it has done if you ask it to do something else first.
What follows are my notes from a somewhat frustrating three hours of my life where I cajoled it to do a very simple thing:
- Request a URL.
- Identify an image to download, trying various resolutions.
- Download the selected image to a specified directory.
- Set it as wallpaper across all of my Mac displays.
I’ve been using a shell script to do this for the Bing Image of the Day that has worked for years now with some minor tweaks, but I wanted to rewrite it in Python so that I could maintain it more easily and (like I’ve been doing with the shell script) wrap it with Platypus to build a user-friendly startup item.
Test Conditions
I did this using Zed pointed to an ollama
instance that was running deepseek-r1:14b
on an NVIDIA 3060, because it is the largest distilled model I can run in 12GB VRAM and I prefer to run all my models on that box (although I could run a bigger version on my Mac, there’s little point, and most people won’t be able to run anything that much bigger locally).
I used Zed’s chat panel (even though I hate using chat for coding) for convenience, and I also tried most of the steps on chat.deepseek.com
, but that has been very spottily available (and, before someone points that out, you can’t currently register API keys to use it directly with Zed, because their site is constantly down). Overall, the experience was pretty much the same with almost exactly the same results, but faster.
I kept most of the chat sessions, but to save people’s time I’m not going to dump the dozens of versions of the script that were generated–I’m only going to include the final one for reference.
In retrospect I probably should have used a git
repository to track the changes, but I went into this thinking it would take half an hour at most, and there’s a limit to how much I can be bothered to do for what should have been a trivial exercise.
Step 1 - Direct Conversion
I first gave my version of the bash
script as context and told the model to “convert this into a Python script”–and I immediately got something that used argparse
(apparently) correctly and used requests
to do the actual downloads, but was essentially a single huge lump of code inside main()
plus a single function that tried to do everything: get Bing’s home page, parse the HTML using a regex and download and save an image.
That had a number of issues:
- It used a bazillion
print
statements throughout the code to replace theecho
calls, wrapped inif args.quiet
clauses. - the parsing was a direct transcription of what the
bash
script did, and was clearly wrong (more on that later). - It lacked any hint of a modular structure, making it hard to read and maintain.
So I asked it to clean up things a bit, and it grouped some of the housekeeping stuff (like clearing the download directory) in its own function(s). Then I asked it to clean up the print
statements, and it created a single function to replace them–with the if
clause inside. Ah well.
But I didn’t want it to use requests
, because a) I wanted the script to use just the standard library for embedding it into Platypus and b) I wanted to see how it would reason through the process as I found errors and fed them back.
Step 2 - Using the Standard Library
So my next prompt was “change the script to use just the Python standard library and write a fetch() function to replace the requests.get
calls”.
And… it mostly worked, except that I had to do multiple iterations–all I did was paste in the error outputs and ask it to “fix the error”, so I can credit it for reasoning through the problem, even if it took huge amounts of time to do that and sometimes failed on the first attempt:
- It got the
http
invocations wrong the first time around. - It forgot about
ssl
, so it had to rewritefetch()
to distinguish betweenhttp
andhttps
and wrap thehttp
calls in anssl
context.
Throughout this process it did some pretty weird things, like adding import ssl
in the middle of the code inside an if
statement. Clearly not the best coding style, but I let it pass.
Step 3 - Handling HTTP Vagaries
Then came the “fun” part:
- It didn’t know how to handle a
307
redirect, and the “reasoning” went on for a couple of screenfuls as it debated internally if it should handle the specific result code or all redirects in the same way. - It quibbled endlessly on how to use recursion (or not) to do retries and eventually settled on a
while
loop. - It was somewhat surprised to have
utf-8
decoding errors when trying to handle agzip
encoded response, so it took a few more screenfuls (and a couple more attempts) to realize that it really needed to handle theContent-Encoding
header.
By this time I got fed up with it importing ssl
and gzip
inside functions, so I just moved the imports to the top myself.
This is where a bunch of people would say to “just use requests
” again, but by this time I wanted to see if the model could reason beyond what an intern would do and actually “understand” (and I use that term with a dollop of salt) what it needed to do to work within the constraints I gave it.
And this, incidentally, was also one of the bits where chat.deepseek.com
stopped working (but was still giving 99% the same output). I came back to it later.
Step 4 - Parsing out the Image Filename
As it turns out, one of the reasons my bash
script stopped working was that you can’t really get at the Bing “Image of the day” filename without JavaScript anymore (at least not with the user agent string I was using).
So I literally told it: “use this URL to get a JSON file that contains the image URL we need to get. Here’s a sample JSON output”, and it:
- Correctly figured out how to request and isolate the URL for the file.
- Did its first successful download–to an incorrect filename, since it was still trying to use a regex to parse the filename from the JSON file.
- It also dreamed up the notion of reading the metadata from the rest of the JSON file and trying to derive a user-friendly filename for it, wrote the code for that, and never actually used it when saving the file.
After a few attempts where I repeatedly asked it to fix how it was parsing the file name, I realized that the model was fundamentally unable to understand that /th?id=foobar.jpg
was not the filename. Here’s what it ended up doing while I asked it to improve the filename parsing:
- It first saved the file solely as
th
. - It then attempted to save it as
th?id=foobar.jpg
(or escaped versions of it, with various combinations of junk), which, sadly, worked.
I eventually asked it to just take the id
parameter and run with it instead of churning out progressively more baroque regular expressions, and it eventually got to it.
Step 5 - Getting the Best Image
If you’ve looked at the bash
script, you’ll realize that there are several typical resolutions the Image of the Day can be gotten as, one of which is UHD
. Throughout the entire rigmarole above, the model did keep a list of the RESOLUTIONS
it could request, but once I gave it the JSON example (which listed only 1920x1080) it completely forgot about using it.
So I asked it to pick the “best” resolution first, and a funny thing happened: It “reasoned” that UHD
was best, “decided” to sort RESOLUTIONS
to try it first, and then consistently, over five attempts, it failed to sort RESOLUTIONS
in the correct way to make sure UHD
came first.
By this time chat.deepseek.com
was back online and I was 2 hours into this (fortunately there was a decent movie on), and it proceeded to do exactly the same as my ollama
distill. So I just fixed it myself.
Step 6 - AppleScript
By now the script was able to grab the Image of the Day, save it to a folder and run osascript
to set the wallpaper… except it didn’t work, because the filenames were incorrectly escaped.
I’m going to magically wave my hand over this one and just say that after nearly half an hour of various attempts at getting it to use two different AppleScript snippets to set the wallpaper across all monitors or a specific one and the model’s complete lack of understanding as to whether or not macOS uses one-based indexing (or not) to pick the specific display I would be selecting with --monitor
I gave up and told it to just set the same wallpaper everywhere.
Step 7 - Cleaning Up
After addressing all these issues, I focused on getting a piece of code I could actually use, so I asked it to “split the individual steps into their own functions”, and…
- It quibbled over naming conventions for three screenfuls.
- It had a few hilarious moments of self-doubt regarding how to pass
args
around, since it was usingargs.quiet
as an argument for a centralizedprint_status
function (I eventually “fixed” it by makingargs
aglobal
just so I could actually go and brush my teeth).
Then came my undoing: I asked it to replace all the output with a Python logger–and since I had seen it hesitate over args.quiet
and whether or not it should output progress, I told it that it could change the --quiet
argument to --verbose
if it made more sense.
What followed were screenfuls and screenfuls of existential doubt while it tried to “reason” if it should use the logging level to determine whether to output progress or not, what kind of logging levels would be deemed acceptable, and which combinations would be “better” with --quiet
or --verbose
.
One run went on for five minutes as it contemplated this existential conundrum, leaving me both amused and exasperated at its over-analysis.
I eventually stopped it and told it to “be pragmatic and minimize output”, so after the next five minutes it actually outputted something useful that worked.
Bonus Step - Type Hints
By this time Zed’s LSP/linter was having kittens at some of the code, so for a final round I asked deepseek-r1:14b
to type hint the code, which it did after another bout of existential crisis where it actually rewrote some of the argument names as well, and the “final” form of the script came into being:
import argparse
import os
import re
import subprocess
import sys
import ssl
import json
import gzip
from urllib import parse
from http.client import HTTPConnection, HTTPSConnection
import logging
from typing import Optional, Union, Dict, Any
# Configure global logger
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
handler = logging.StreamHandler()
handler.setFormatter(logging.Formatter('%(asctime)s - %(levelname)s - %(message)s'))
logger.addHandler(handler)
# Constants
DEFAULT_PICTURE_DIR: str = os.path.expanduser("~/Pictures/bing-wallpapers/")
def fetch(url: str, headers: Optional[Dict[str, str]] = None, timeout: int = 10) -> Union[bytes, None]:
"""Handle HTTP requests with redirect following and content decoding"""
logger.debug(f"Fetching: {url}")
parsed_url = parse.urlsplit(url)
scheme: str = parsed_url.scheme
host: str = parsed_url.hostname
path: str = parsed_url.path
query: str = parsed_url.query
port: int = parsed_url.port or (443 if scheme == 'https' else 80)
headers = headers or {}
headers.update({
'Host': host,
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Safari/605.1.15',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Connection': 'keep-alive'
})
while True:
conn: Optional[Union[HTTPConnection, HTTPSConnection]] = None
try:
if scheme == 'https':
context = ssl.create_default_context()
conn = HTTPSConnection(host, port, timeout=timeout, context=context)
else:
conn = HTTPConnection(host, port, timeout=timeout)
logger.debug(f"Sending request to {host}")
conn.request('GET', f"{path}?{query}", headers=headers)
response = conn.getresponse()
if 300 <= response.status < 400:
url = parse.urljoin(url, response.getheader('Location'))
parsed_url = parse.urlsplit(url)
scheme = parsed_url.scheme
host = parsed_url.hostname
path = parsed_url.path
query = parsed_url.query
port = parsed_url.port or (443 if scheme == 'https' else 80)
continue
if response.status != 200:
logger.error(f"Request failed with status code: {response.status}")
return None
content: bytes = response.read()
break
except Exception as e:
logger.error(f"Connection error: {e}")
return None
finally:
if conn is not None:
conn.close()
# Handle content decoding
content_encoding: str = response.getheader('Content-Encoding', '')
if 'gzip' in content_encoding:
content = gzip.decompress(content)
logger.debug("Successfully fetched and decoded content")
return content
def setup_directory(picture_dir: str, clear: bool) -> None:
"""Create directory and optionally clear existing files"""
try:
os.makedirs(picture_dir, exist_ok=True)
if clear:
for filename in os.listdir(picture_dir):
file_path = os.path.join(picture_dir, filename)
if os.path.isfile(file_path):
os.unlink(file_path)
logger.info(f"Cleared directory: {picture_dir}")
except Exception as e:
logger.error(f"Directory setup failed: {e}")
raise RuntimeError("Failed to create or clear directory")
def fetch_bing_wallpaper_data(resolution: str) -> Union[bytes, None]:
"""Fetch wallpaper metadata from Bing API"""
api_url = "https://www.bing.com/HPImageArchive.aspx"
params: Dict[str, str] = {
'format': 'js',
'idx': '0',
'n': '1',
'mkt': 'en-US',
'resolution': resolution
}
logger.debug(f"Fetching API data with parameters: {params}")
query: str = parse.urlencode(params)
api_request_url: str = f"{api_url}?{query}"
content: bytes = fetch(api_request_url)
if not content:
logger.error("Failed to fetch Bing API data")
raise RuntimeError("API request failed")
return content
def process_api_response(content: bytes, resolution: str) -> str:
"""Extract image URL from API response"""
try:
logger.debug("Processing API response")
data: Dict[str, Any] = json.loads(content)
image_info: Dict[str, Any] = data['images'][0]
image_url: str = f"https://www.bing.com{image_info['urlbase']}_{resolution}.jpg"
logger.debug(f"Extracted image URL: {image_url}")
return image_url
except (KeyError, IndexError, json.JSONDecodeError) as e:
logger.error(f"Invalid API response format: {e}")
raise ValueError("Failed to parse API response")
def download_image(image_url: str, output_dir: str, force: bool = False, verbose: bool = False) -> str:
"""Download image file with proper error handling"""
parsed_url = parse.urlsplit(image_url)
filename: Optional[str] = parse.parse_qs(parsed_url.query).get('id', [None])[0]
filepath: str = os.path.join(output_dir, filename)
if not force and os.path.exists(filepath):
logger.info(f"Image exists: {filename}")
return filepath
content: bytes = fetch(image_url)
if not content:
logger.error("Failed to download image content")
raise RuntimeError("Download failed")
try:
with open(filepath, 'wb') as f:
f.write(content)
logger.info(f"Downloaded: {filename}")
return filepath
except IOError as e:
logger.error(f"File write error: {e}")
raise RuntimeError("Failed to save file")
def set_wallpaper(filepath: str, verbose: bool = False) -> None:
"""Set wallpaper for all monitors using AppleScript"""
escaped_path: str = filepath.replace('"', r'\"')
try:
logger.info("Running AppleScript command")
subprocess.run([
'osascript', '-e',
f'tell application "System Events" to tell every desktop to set picture to POSIX file "{escaped_path}"'
], check=True, capture_output=True)
logger.info("Wallpaper updated successfully on all monitors")
except subprocess.CalledProcessError as e:
logger.error(f"Failed to set wallpaper: {e.stderr.decode().strip()}")
raise RuntimeError("Wallpaper update failed")
def parse_arguments() -> argparse.Namespace:
"""Configure and parse command-line arguments"""
parser = argparse.ArgumentParser(description='Bing Wallpaper Downloader')
parser.add_argument('-v', '--verbose', action='store_true',
help='Enable verbose output')
parser.add_argument('-f', '--force', action='store_true',
help='Force overwrite existing files')
parser.add_argument('-c', '--clear', action='store_true',
help='Clear existing files before download')
parser.add_argument('-p', '--picture-dir', default=DEFAULT_PICTURE_DIR,
help=f'Image storage directory (default: {DEFAULT_PICTURE_DIR})')
parser.add_argument('-r', '--resolution', default='UHD',
choices=['UHD', '1920x1200', '1920x1080'],
help='Wallpaper resolution')
return parser.parse_args()
def main() -> None:
"""Main workflow execution"""
args = parse_arguments()
logger.setLevel(logging.DEBUG if args.verbose else logging.WARNING)
try:
setup_directory(args.picture_dir, args.clear)
api_content = fetch_bing_wallpaper_data(args.resolution)
image_url = process_api_response(api_content, args.resolution)
image_path = download_image(image_url, args.picture_dir, force=args.force)
set_wallpaper(image_path)
except Exception as e:
logger.error(f"Error: {e}")
sys.exit(1)
if __name__ == "__main__":
main()
Conclusion
This “works”, but the entire process has confirmed to me that I would have done it not just faster but in a much cleaner and readable way, especially when building the HTTP requests by hand and doing all of the error handling–which I would have done in a completely different way.
All the kvetching the model did around parsing and logging and whatnot was completely pointless and a huge waste of time, and the only thing that it really accelerated was the argument parsing. The code structure is, overall, sub-par, and I’ve already spotted at least two possible bugs.
But I should point out that none of this is specific to deepseek-r1
– I have had very similar experiences with o1
and other models, so I honestly don’t think developers need to fear for their jobs anytime soon (except, again, perhaps front-end folk, where boilerplate is far more prevalent).
I also don’t rate this as any better than a junior/intern, and can’t help but think that we’re very far from replacing developers with AI for anything that isn’t trivial (and this was just about as trivial a problem as you could have, with a reference implementation to boot!). Domain expertise is still key, and the model’s inability to “understand” some things is proof that AI models haven’t cracked that yet.
It was a fun experiment, though, because even if I ended up wasting three hours of my evening, I gained valuable insights into the “reasoning” process and how stupefyingly fallible it still is.
AI, Accelerating Idiots into endless rabbit holes…