How does DitaBase look in practice?
Let's suppose Jane has a product catalog:
SKU | Product Name | Category | Price |
---|---|---|---|
TRO-5593 | TriCool Aluminum Water Bottle |
Water Bottles | $16.99 |
She needs to upload this to her own Shopify site, Amazon, Walmart, Google Shopping and about 20 other sites. However, each site has a slightly different schema, with tedious changes:
SKU | Product |
||
---|---|---|---|
TRO-5593 | TriCool Aluminum Water Bottle |
Sports & Fitness : Accessories : Sports Water Bottles |
16,99 USD |
To deal with this, Jane manually creates and edits a different spreadsheet for every site. It is very annoying and time consuming. She has no way of knowing if there are mistakes until she attempts to upload the sheet to a website, and it gets rejected.
Fortunately, Jane only has 50 products to worry about. But this means adding new products, or expanding to more websites is very difficult, if not impossible.
Using dit, a developer could create a ProductAmazon
object.
The product data would be represented in the .dit as JSON, XLSX, or any other format.
This makes it easy to work with a variety of tools and data.
When published from the dit, it would exactly resemble a tab delimited CSV file with all of the fields Amazon requires.
As long as it inherits from Product
and FormatTabCSV
, a lot of the work is already done.
They just need to write validators for each little custom field Amazon uses, like BrandName
, Department
, and so on.
This means Jane can write a single spreadsheet that extends Product
, and the format for whatever spreadsheet program she would like to use.
The program could have a DitaBase plugin which validates the fields in real time.
If there are errors, she knows immediately, and can correct them much more quickly.
Then with a single click, the spreadsheet can be converted to ProductAmazon
, ProductShopify
, etc.
The process can still be tedious, but will now take hours, not weeks.
ProductAmazon
would be a huge undertaking by yourself, and almost certainly not worth the effort.
But with the power of open source, shared work, it becomes achievable, and even cheaper in the long run.
ProductAmazon
object, and couldn't stop it if they wanted to either.
Sara is the CTO of large medical company, which just acquired a small medical startup. Sara's company has an old codebase, of which one component is a MySQL database. The startup has an amazing API and codebase, which includes MongoDB.
Somehow, Sara wants to move over to using the new codebase, but this must be handled delicately. Ideally, the MySQL and MongoDB could both be live in the interim, without developing an unnecessary API for the old codebase. To further complicate things, the old code base uses a variety of outdated schemas, full of useless fields and missing important new ones.
Sara can start by making a custom PersonLegacy
object that matches the old schema.
This should inherit from Person
and FormatMySQL
.
Then, a brand new PersonCompany
object with FormatJSON
.
Finally, converters to go between.
Any part of the codebase can access new customers as though they were the old system, and old customers as though they were new.
However long the transition takes, there's no rush due to data downtime.
PersonID | Name | Age | Blood Type |
---|---|---|---|
GUL-89323 | John Doe | 36 | O- |
MAL-34106 | Jane Doe | 67 | AB+ |
PersonID | ConditionID |
---|---|
MAL-34106 | 1342 |
MAL-34106 | 1305 |
ConditionID | Condition |
---|---|
1342 | Arthritis |
1305 | High Blood Pressure |
{
"people": [
{
"_id": "GUL-89323",
"name": "John Doe",
"age": 36,
"blood-type": "O-"
},
{
"_id": "MAL-34106",
"name": "Jane Doe",
"age": 67,
"blood-type": "AB+",
"conditions": [
"Arthritis",
"High Blood Pressure"
]
}
]
}
Now let's meet Tom. Tom is laying out a construction project, up in the mountains. He contracted with a Land Surveying company, which produced topographic data that looks like this:
GPS Point | North | East | Elevation | Datum |
---|---|---|---|---|
t-125 | 24504.145 | 17948.076 | 3543.846 | NAD83 |
t-126 | 24508.491 | 17950.059 | 3543.347 | NAD83 |
Tom also contracted with a satellite imaging service, which produced a series of photographs so that Tom would know the number and location of trees and other land features. The photographs came with metadata which give estimates of coordinates and number of trees:
{
"images": [
{
"file": "19-05-17-10:30:45:231.png",
"coordinate": "41°29'14.1\"N 76°50'14.9\"W",
"trees": [
"41°29'12.5\"N 76°50'10.6\"W",
"41°29'12.1\"N 76°50'10.6\"W",
"41°29'11.8\"N 76°50'10.7\"W",
"41°29'11.1\"N 76°50'10.4\"W"
]
}
]
}
Tom needs to know where to place buildings, roads, power lines, sewer, and more. The more trees that come down, the greater the costs, both in cutting services and government conservation taxes. It would save Tom a lot of time and money to be able to work with a single dataset instead of each one separately.
Unfortunately, their data doesn't look alike at all, even though they are theoretically similar industries. Even minor differences in the data means extra work for Tom, and this data isn't even close to similar.
This also isn't the first time Tom has needed two dissimilar survey data sets to work together. The last time, Tom couldn't find another solution and hired a programmer to write custom scripts. But that was a one time solution and won't work here. What a waste of money!
Ideally, one or both of the companies would offer their data in a dit format.
Instead of a plain string with latitude and longitude, Tom would be greeted by a child of the Coordinate
object.
This way, Tom can use free tools on the DitaBase website to convert both datasets to some compatible format, and do the rest manually.
Even though it's an odd, one time situation, DitaBase still helps Tom enormously.
As with most technologies, the more developers using DitaBase, the better it gets. But unique to DitaBase, this also applies to data. The more data in dit objects, the more likely it is that any problem found in data already has a solution. It also becomes less likely to encounter errors in general.
This means people like Tom who don't know anything about validation or formats care very much whether data is on the dit ecosystem. That makes dit data much more valuable.