Introduction

Piri uses a configuration file to govern output structure and contents. This section is the introductionary course to Piri.

The setup¶

For this introduction course we will use piri-cli since it provides you with a simple command line tool to run piri. And no need to create any python files.

Install with pip:

pip install piri-cli

All examples will have a config, input and output json tab like this:

config.json

{}

input.json

{}

output.json

{}

Copy the contents of config.json and input.json down to your working dir.

Run all examples with the following unless otherwise stated.

piri config.json input.json

About JSON¶

Json is a human readable data format that stores data in objects consisting of attribute-value pairs and arrays. We will use the terms object and attribute quite often in this guide. To put it simply an object contains attributes that hold values. These values can sometimes be another object or even an array of objects.

{
    "person": {
        "name": "Bob",
        "height": 180.5,
        "friends": [
            {
                "name": "John",
                "height": 170.5
            }
        ]
    }
}

person is an object, name and height are attribtes, "Bob" and 180.5 are values to those attributes. friends is a list(array) of objects.

The Root¶

The root of all ev... piri configs looks like this

{
    "name": "root",
    "array": false,
    "attributes": [],
    "objects": []
}

So this will fail since we consider empty result a failure, but this config generates the enclosing {} brackets you can see in the example in the About JSON section.

{}

Adding Attributes to root¶

To actually map some data we can add attributes.

config.json

{
    "name": "root",
    "array": false,
    "attributes": [
        {
            "name": "firstname",
            "default": "Thomas"
        }
    ]
}

input.json

{}

output.json

{
    "firstname": "Thomas"
}

Congratulations, you've just mapped a default value to an attribute! - Click output.json tab to see the output.

Structuring with objects¶

config.json

{
    "name": "root",
    "array": false,
    "objects": [
        {
            "name": "person",
            "array": false,
            "attributes": [
                {
                    "name": "firstname",
                    "default": "Thomas"
                }
            ]
        }
    ]
}

input.json

{}

output.json

{
    "person": {
        "firstname": "Thomas"
    }
}

What we just did is the core principle of creating the output structure. We added an object with the name person, then we moved our firstname attribute to the person object.

Time to Map some values!¶

We will now introduce the mappings key, it's and array of mapping objects.

The mapping object is the only place where you actually fetch data from the input. And you do that by specifying a path. The path describes the steps to take to get to the value we are interested in.

Mapping.path with flat structure¶

config.json

{
    "name": "root",
    "array": false,
    "objects": [
        {
            "name": "person",
            "array": false,
            "attributes": [
                {
                    "name": "firstname",
                    "mappings": [
                        {
                            "path": ["name"]
                        }
                    ]
                }
            ]
        }
    ]
}

input.json

{
    "name": "Neo"
}

output.json

{
    "person": {
        "firstname": "Neo"
    }
}

Mapping.path with nested structure¶

config.json

{
    "name": "root",
    "array": false,
    "objects": [
        {
            "name": "actor",
            "array": false,
            "attributes": [
                {
                    "name": "name",
                    "mappings": [
                        {
                            "path": ["the_matrix", "neo", "actor", "name"]
                        }
                    ]
                }
            ]
        }
    ]
}

input.json

{
    "the_matrix": {
        "neo": {
            "actor": {
                "name": "Keanu Reeves"
            }
        }
    }
}

output.json

{
    "actor": {
        "name": "Keanu Reeves"
    }
}

Mapping.path with data in lists¶

Consider the following json:

{
    "data": ["Keanu", "Reeves", "The Matrix"]
}

In our mapping object we supply path which is a list of how we get to our data. So how do we get the lastname in that data?

Easy, we reference the index of the list. The first data in the list starts at 0, second element 1, third 2 and so on. This number is the index and to get the last name we must use the index: 1

config.json

{
    "name": "root",
    "array": false,
    "attributes": [
        {
            "name": "firstname",
            "mappings": [
                {
                    "path": ["data", 0]
                }
            ]
        },
        {
            "name": "lastname",
            "mappings": [
                {
                    "path": ["data", 1]
                }
            ]
        }
    ]
}

input.json

{
    "data": ["Keanu", "Reeves", "The Matrix"]
}

output.json

{
    "firstname": "Keanu",
    "lastname": "Reeves"
}

Note

We still have to reference the "data" key first, so our path goes first to data then it finds the value at index 1

Combining values¶

Now lets learn how to combine values from multiple places in the input.

It's fairly normal to only need name but getting firstname and lastname in input data. Lets combine them!

config.json

{
    "name": "root",
    "array": false,
    "objects": [
        {
            "name": "actor",
            "array": false,
            "attributes": [
                {
                    "name": "name",
                    "mappings": [
                        {
                            "path": ["the_matrix", "neo", "actor", "firstname"]
                        },
                        {
                            "path": ["the_matrix", "neo", "actor", "lastname"]
                        }
                    ],
                    "separator": " ",
                }
            ]
        }
    ]
}

input.json

{
    "the_matrix": {
        "neo": {
            "actor": {
                "firstname": "Keanu",
                "lastname": "Reeves"
            }
        }
    }
}

output.json

{
    "actor": {
        "name": "Keanu Reeves"
    }
}

To find more values and combine them, simply add another mapping object to mappings array.

Use separator to control with what char values should be separated.

Slicing¶

You can use slicing to cut values at value[from:to] which is very useful when you are only interested in part of a value. The value is turned into a string with str() before slicing is applied.

String slice¶

Lets say that we have some value like this street-Santas Polar city 45. We would really like to filter away the street- part of that value. And that is exactly what Slicing is for.

config.json

{
    "name": "root",
    "array": false,
    "objects": [
        {
            "name": "fantasy",
            "array": true,
            "path_to_iterable": ["data"],
            "attributes": [
                {
                    "name": "name",
                    "mappings": [{"path": ["data", 0]}]
                },
                {
                    "name": "street",
                    "mappings": [
                        {
                            "path": ["data", 1],
                            "slicing": {
                                "from": 7
                            }
                        }
                    ]
                }
            ]
        }
    ]
}

input.json

{
    "data": [
        ["santa", "street-Santas Polar city 45"],
        ["unicorn", "street-Fluffy St. 40"]
    ]
}

output.json

{
    "fantasy": [
        {
            "name": "santa",
            "street": "Santas Polar city 45"
        },
        {
            "name": "unicorn",
            "street": "Fluffy St. 40"
        }
    ]
}

Hint

If you have some max length on a database table, then you can use string slicing to make sure the length does not exceed a certain length with the to key. Some databases also has for example two address fields for when the length of one is too short. Then map both with slicing "from" :0, "to": 50 and "from": 50, "to": null respectively and you'll solve the problem.

Slicing numbers and casting¶

You can also slice numbers, bools and any other json value since we cast the value to string first. This means that if you for example get a social security number but is only interested in the date part of it, you can slice it. And then even cast the value to a date.

2020123112345 -> "20201231" -> "2020-12-31"

config.json

{
    "name": "root",
    "array": false,
    "objects": [
        {
            "name": "fantasy",
            "array": true,
            "path_to_iterable": ["data"],
            "attributes": [
                {
                    "name": "name",
                    "mappings": [{"path": ["data", 0]}]
                },
                {
                    "name": "birthday",
                    "mappings": [
                        {
                            "path": ["data", 1],
                            "slicing": {
                                "from": 0,
                                "to": 8
                            }
                        }
                    ],
                    "casting": {
                        "to": "date",
                        "original_format": "yyyymmdd"
                    }
                }
            ]
        }
    ]
}

input.json

{
    "data": [
        ["santa", 2020123112345],
        ["unicorn", 1991123012346]
    ]
}

output.json

{
    "fantasy": [
        {
            "name": "santa",
            "birthday": "2020-12-31"
        },
        {
            "name": "unicorn",
            "birthday": "1991-12-30"
        }
    ]
}

Hint

If you need to take values from end of string, like the 5 last characters, then you can use a negative from value to count from the end instead. This works just like pythons slicing functionality.

If statements¶

Are useful for when you for example get some numbers in your data that are supposed to represent different types.

Simple if statement¶

Let's check if the value equals 1 and output type_one.

config.json

{
    "name": "root",
    "array": false,
    "attributes": [
        {
            "name": "readable_type",
            "mappings": [
                {
                    "path": ["type"],
                    "if_statements": [
                        {
                            "condition": "is",
                            "target": "1",
                            "then": "type_one"
                        }
                    ]
                }
            ]
        }
    ]
}

input.json

{
    "type": "1"
}

output.json

{
    "readable_type": "type_one"
}

If statements are really useful for changing the values depending on some condition. Check the list of supported conditions.

otherwise can also be used to specify should happen if the condition is false. If otherwise is not provided then output will be the original value.

Chain If Statements¶

if_statements is a list of if statement objects. We designed it like this so that we can chain them. The output of the first one will be the input of the next one.

the mapping object is not the only one that can have if statements, the attribute can also have them. This allows for some interesting combinations.

config.json

{
    "name": "root",
    "array": false,
    "attributes": [
        {
            "name": "readable_type",
            "mappings": [
                {
                    "path": ["type"],
                    "if_statements": [
                        {
                            "condition": "is",
                            "target": "1",
                            "then": "boring-type"
                        },
                        {
                            "condition": "is",
                            "target": "2",
                            "then": "boring-type-two",
                            "otherwise": "fun-type"
                        },
                        {
                            "condition": "contains",
                            "target": "fun",
                            "then": "funky_type"
                        }
                    ]
                }
            ],
            "if_statements": [
                {
                    "condition": "not",
                    "target": "funky_type",
                    "then": "junk",
                    "otherwise": "funk"
                }
            ]
        }
    ]
}

input.json

{
    "type": "1"
}

output.json

{
    "readable_type": "funk"
}

input2.json

{
    "type": "2"
}

output2.json

{
    "readable_type": "junk"
}

Using input.json the places that are highlighted is everywhere the value changes.

For input2.json the first if statement is false and no value change. The second if statement is true so value is changed to boring-type-two. The third if statement is false so no value change. The last if statement checks if the value is not funky_type which is true, so the value is changed to junk.

You can even add if statements for every mapping object you add into mappings so this can handle some quite complicated condition with multiple values.

Casting values¶

You've learned how to structure your output with objects, find values and asigning them to attributes, combining values and applying if statements. Its now time to learn how to cast values.

Casting values is very useful for when we get string(text) data that should be numbers. Or when you get badly(non-iso) formatted date values that you want to change to ISO dates

Casting is straightforward. You map your value like you would and then add the casting object.

Casting to decimal¶

config.json

{
    "name": "root",
    "array": false,
    "attributes": [
        {
            "name": "my_number",
            "mappings": [
                {
                    "path": ["string_number"]
                }
            ],
            "casting": {
                "to": "decimal"
            }
        }
    ]
}

input.json

{
    "string_number": "123.12"
}

output.json

{
    "my_number": 123.12
}

Casting to ISO Date¶

When casting to a date we always have to supply the original_format which is the format that the input data is on. without knowing this there would be know way to know fore sure in every case if it was dd.mm.yy or yy.mm.dd

config.json

{
    "name": "root",
    "array": false,
    "attributes": [
        {
            "name": "my_iso_date",
            "mappings": [
                {
                    "path": ["yymmdd_date"]
                }
            ],
            "casting": {
                "to": "date",
                "original_format": "yymmdd"
            }
        }
    ]
}

input.json

{
    "yymmdd_date": "101020"
}

output.json

{
    "my_iso_date": "2010-10-20"
}

Check our the configuration docs on casting for more info

Working with lists¶

Finally! Last topic and the most interesting one!

Usually the data that you are processing is not one thing, but a list of things(data). We want to iterate the list and for each and every piece of data in that list we want to transform it. This is the section that lets you do that.

Lets say we are creating a website for an RPG game that dumps its data in a flat format. Every line is a player with a name, class, money, and x, y coordinates for where he is in the world.

data.json

{
    "data": {
        "something_uninteresting": [1, 2, 3],
        "character_data": [
            ["SuperAwesomeNick", 1, 500],
            ["OtherAwesomeDude", 2, 300],
            ["PoorDude", 2, 10]
        ]
    }
}

Now to make the frontend dudes happy we would liketo structure this nicely... something like:

{
    "players": [
        {
            "nickname": "SuperAwesomeNick",
            "class": "warrior",
            "gold": 500,
        },
        {
            "nickname": "OtherAwesomeDude",
            ...
        }
    ]
}

Introducing Path to Iterable¶

We can use path_to_iterable on an object which works similar to mapping.path, but it applies the current object and all its attrbute mappings and nested objects to each and every element in whatever list path_to_iterable points to.

Lets solve the above example!

config.json

{
    "name": "root",
    "array": false,
    "objects": [
        {
            "name": "players",
            "array": true,
            "path_to_iterable": ["data", "character_data"],
            "attributes": [
                {
                    "name": "nickname",
                    "mappings": [
                        {
                            "path": ["character_data", 0]
                        }
                    ]
                },
                {
                    "name": "class",
                    "mappings": [
                        {
                            "path": ["character_data", 1]
                        }
                    ],
                    "if_statements": [
                        {
                            "condition": "is",
                            "target": 1,
                            "then": "warrior",
                            "otherwise": "cleric"
                        }
                    ]
                },
                {
                    "name": "gold",
                    "mappings": [
                        {
                            "path": ["character_data", 2]
                        }
                    ]
                }
            ]
        }
    ]
}

input.json

{
    "data": {
        "something_uninteresting": [1, 2, 3],
        "character_data": [
            ["SuperAwesomeNick", 1, 500],
            ["OtherAwesomeDude", 2, 300],
            ["PoorDude", 2, 10]
        ]
    }
}

output.json

{
    "players": [
        {
            "nickname": "SuperAwesomeNick",
            "class": "warrior",
            "gold": 500
        },
        {
            "nickname": "OtherAwesomeDude",
            "class": "cleric",
            "gold": 300
        },
        {
            "nickname": "PoorDude",
            "class": "cleric",
            "gold": 10
        }
    ]
}

Note that we still have to reference the key when mapping. The key name(character_data) is the last name in the list of path_to_iterable.

And thats it!

Congratulations the introduction course is done!

Time to map some data and have fun doing it!