Dealing with Databases - Inserting, Updating Etc.

imperator@sh.itjust.works · 1 year ago

Dealing with Databases - Inserting, Updating Etc.

atzanteol@sh.itjust.works · 1 year ago

Databases are more efficient with bulk queries.

Rather than query each entry individually batch your data and query for the existence of that batch (e.g. where key in (1,2,3,etc)). You could do this one out json document, once per 100 entries, or however it makes sense. You can then check the results for your key to determine whether to insert or update. Then commit on that batch set.

imperator@sh.itjust.works · 1 year ago

Do you happen to have any examples? I’m just not sure how to convert the JSON example into a bulk query since I need to keep the reference and line detail associated to the header. There is no primary key across all 3 sections. It’s generated when I insert into the database.

atzanteol@sh.itjust.works · edit-2 1 year ago

It’s a little hard to say without seeing your datastructure. Is this something like

{ header: {
  id: 1,
  items: [ {
      name: "foo",
      field2: "bar"
   } ]
}

If you have something unique in the “header” you can create 2 tables with a dependency.

create table header ( id number );
create table item ( id number, header_id number, name varchar, field2 varchar);

You can generate IDs for each item on-the-fly but won’t be able to tie the back to the JSON. BUT if you can tie back header to the JSON then you can do a “drop-and-replace” on the items with each run. Which may not be the most efficient but it will likely perform better than querying each row upon entry. e.g. (pseudocode)

for each header in headers {
   delete from item where item.parent_id = header.id;
   for each item in header.items {
       insert into item values ( some_id, header.id, item.name, item.field2 );
   }
   commit;
}

But if you don’t want to drop/re-create then if there’ some combination of things in the “item” that is unique then you can use that as a compound key. In the worst case you can just use all the columns. I once created a primary key that was an MD5 checksum of the string value of all the fields in the row. It gave me a calculable primary key which was good and I could query off it easily. But it does make expanding the table much harder…

The advantage of drop-and-replace will be that removed items in the JSON will also be removed in the database. Otherwise you’ll need to do some additional cleanup to find database entries that don’t have an entry in your JSON file(s).