python - Checking for existence of _id using and updating subdocument Pymongo -

- February 15, 2012

i'm trying write mongodb backend puzzle website. i'm new pymongo , i've been struggling find way check unique key identifier , update subdocument if exits. layout this:

{ _id : jack "username": jack "puzzles": [     {         "name": puzname,         "rank": rank,         "date": puzdate,         "global score": score,         "points": points     } ], "attempts": 1 }

if jack exists, this:

{ _id : jack "username": jack "puzzles": [     {         "name": puzname,         "rank": rank,         "date": puzdate,         "global score": score,         "points": points     }     {         "name": puzname2,         "rank": rank,         "date": puzdate,         "global score": score,         "points": points     } ], "attempts": 2 }

to populate fields, i'm taking fields existing html , using beautiful soup.

cells = row('td') rank = cells[0].string name = cells[1].find_all('a')[1].find(text=true).strip() score = row('td')[3].string points = row('td')[4].string  puz_dict = {} puz_dict['_id'] = name.encode('ascii','ignore') puz_dict['username'] = name.encode('ascii','ignore') puz_dict['puzzles'] = {'puzzle name': puzname, 'rank': int(str(rank)), "date": puzdate,'global score' : locale.atoi(str(score)), 'points' : int(str(points)) } puz_dict['attempts'] = 1  connection = mongoclient('localhost') coll = connection['puzzles']['users'] if col.find({'_id' : puz_dict['_id']}).count() > 0:      print "updating user"      update stuff else:          coll.insert(puz_dict)

as can see i'm using username way uniquely identify document. far good. checking database, user information populates properly.

now check see if user exists, , if do, update "puzzles" field include puzzle , increment updates 1. thought work check existence, doesn't seem work , instead goes straight insert:

if col.find({'_id' : puz_dict['_id']}).count() > 0:      print "updating user"      update stuff

why not checking? how can update subdocument?

well since seem new databases in general might strike correct thing not "find" things "update" , "save", rather send "update" request instead:

coll = connection['puzzles']['users']  # after each assignment  coll.update_one(    { "_id": puz_dict["_id"] },    {        "$setoninsert": { "username": puz_dict["username"] },        "$push": { "puzzles": puz_dict["puzzles"] },        "$inc": { "attempts": puz_dict["attempts"] }    },    upsert = true )

so these "updates" work looking document matches _id value , considering following actions:

$push contains content added array field. new content "appended" array in document named "puzzles".
$inc @ current value of "attempts" in document , "increment" value whatever value has been supplied argument there.
$setoninsert special, , rather making changes every document matched, instead makes supplied modifications upsert occurs.
upsert of course final setting, means _id value not matched, new document created instead _id value used document, , content mentioned in $setoninsert.

of course every matched document or created document subject other $push , $inc operations, these applied, either against existing content or adding content found in matched document.

in best case, when looping data source better commit such "writes" database in "bulk", rather send every operation 1 @ time:

# import updateone bulk helper pymongo import updateone  # outside loop of writing sourcing data operations = []  # inside loop of sourcing data, add queue  operations.append(     updateone(         { "_id": puz_dict["_id"] },         {             "$setoninsert": { "username": puz_dict["username"] },             "$push": { "puzzles": puz_dict["puzzles"] },             "$inc": { "attempts": puz_dict["attempts"] }         },         upsert = true     )     )  # write server 1 in 1000 , clear queue if ( len(operations) % 1000 == 0 ):     coll.bulk_write(operations)     operations = []  # finish loop  # write again if there still queued operations # remaining on loop  completion  if ( len(operations) > 0 ):     coll.bulk_write(operations)

that's how handle it, adding operation each line of detail processed input , writing several operations @ once ( ideally 1000 or less in accordance driver ) rather idividual writes.

but @ rate, there no need "lookup" data seperate requests, since "updates" particularly "upserts" meant handle. atomic operations allow modification of data "in-place", not necessary read document content before making changes.

also note "connections", such obtained mongoclient should every happen once in application lifecycle. no matter application doing, connection should available , persist throughout life of application until runs completion or otherwise termininates.

Search This Blog

Earony

python - Checking for existence of _id using and updating subdocument Pymongo -

Comments

Post a Comment

Popular posts from this blog

java - Run spring boot application error: Cannot instantiate interface org.springframework.context.ApplicationListener -

python - pip wont install .WHL files -

Excel VBA "Microsoft Windows Common Controls 6.0 (SP6)" Location Changes -