Data wrangling with short python code
If you work with data on day to day basis. I think you will encounter a problem of incompatible data types ie. some datas are stored in csv files while other are in json or even yaml or toml file. All of these files have their own advantage of disadvantage and sometimes we need to convert all of data to a single format for further analysis. This article presents tools that I used in day to day basis to carry out that particular task.
Although you may think that you need to write a script to convert one form of data to another but that is not the case. If you know python well, you could write a simple inline python for the conversion of the data from one format to another.
python -c "import sys; print('Hello World');"
The above command is a hello world using inline python code. :D
The major thing that you need to remember why trying to use python inline code is that you need to escape the " if you need to use them in the code itself. This is similar to inline awk script if you are familiar with that.
Although, the above trick of running python code from the bash itself is useful. We need to know how to pipe stdout data to the script to make it more useful and feel like first class citizen as a bash command. We could do this by reading from sys.stdin file.
echo "Hello World" | python -c "import sys; print(sys.stdin.read())"
If you grasp the above two concepts then you can convert data from stdin to another type and display it into stdout using sigle line of code.
python -c "import sys, yaml; print(yaml.load(sys.stdin)))"
The above line of code converts yaml to json.
Similarly, we can covert toml to json using the above command.
python -c "import sys, toml; print(toml.loads(sys.stdin.read()))"
Similarly, we can convert json to yaml and toml using the command below.
python -c "import sys, yaml; print(yaml.dump(sys.stdin.read()))"
python -c "import sys, toml; print(toml.dumps(sys.stdin.read()))"
You could store them as a alias and even chain them with a pipe to convert toml to yaml and vice versa or you could just write a slightly longer script.
If you need a more facility of filtering or conversion of only specific portion of the data. You could write a script like I have written.
The above script just takes one positional parameter and then use it as index of data to filter the data. We can call it the poor mans jq.
Json to csv
toml to json
yaml to json
json to toml
json to yaml
If you like the above script and you want to install them. You could use brew to install them on mac.
brew tap shubhajeet/cluster brew install jsontools