Tuesday, April 7, 2015

MySQL connection in Python: MySQLdb.cursors optimization

The naive code I use to reproduce this problem looks like this: 
import MySQLdb

conn = MySQLdb.connect(user="user", passwd="password", db="mydb")
cur = conn.cursor()
print "Executing query"
cur.execute("SELECT * FROM bigtable");

print "Starting loop"
row = cur.fetchone()
while row is not None:
    print ", ".join([str(c) for c in row])
    row = cur.fetchone()

cur.close()
conn.close()
On a ~700,000 rows table, this code runs quickly. But on a ~9,000,000 rows table it prints "Executing Query" and then hangs for a long long time. That is why it makes no difference if I use fetchone() or fetchall().

Here is the solution:
 MySQLdb.connect(user="user", 
                 passwd="password",
                 db="mydb",
                 cursorclass = MySQLdb.cursors.SSCursor
                )
 A possibly even better solution with memory reduction:
 MySQLdb.connect(user="user", 
                 passwd="password",
                 db="mydb",
                 cursorclass = MySQLdb.cursors.SSDictCursor 
                )
I found a huge reduction in memory usage with cursors.SSDictCursor -- so if you want results returned in a dictionary per row, use that.

No comments:

Post a Comment