Hacker News

arandomhuman
Show HN: Qq: like jq, but can transcode between many formats github.com

qq is jq inspired interoperable config format transcoder with interactive querying. It features an optional interactive editor with autocomplete for structured data. And supports inputs and outputs for json, xml, ini, toml, yaml, hcl, tf, and csv to varying degrees of capability.


krick3 months ago

I think there may already exist a jq alternative for every letter of English alphabet. Why not just make a tool that specializes at transcoding csv/json/xml/protobuf/etc, and leave querying to any of dozens of these utils, that already exist? So that I can write qq input.xml -o json | jq '.' | qq -f json -o hcl? It's not like the intention is to bring something new with regards to querying lanuage, the "real stuff" is just an heuristic transcoder of popular data serialization formats.

arandomhumanop3 months ago

You can indeed run “qq input.xml| jq '.' | qq -o hcl”. The tool does aim to serve that purpose that’s why it outputs json by default. The codec is importable as well but there’s a lot of work to be done improving and maintaining it.

arandomhumanop3 months ago

Sorry just posting this for example.

  $ qq tests/test.hcl | jq | qq -o yaml
  app_name: SimpleApp
  database:
  - host: localhost
    password: password
    port: 5432
    username: admin
    ...

akhenakh3 months ago

Readers may be interested to Dasel as well https://daseldocs.tomwright.me/supported-file-formats

w10-13 months ago

This is so promising!

I presume like pandoc, the goal is to use an internal model to reduce the M*N complexity of format transcoding to M+N.

But when I look at the tests for this (and Dasel and rq also mentioned here), I see very little. jq seems to have much more. Perhaps I'm missing them.

Which leads to some side/related comments...

A great project would be to build a transcoded test suite that all these tools could run against. It wouldn't be hard to generate.

Also, while go is a great language, I think pandoc gets a lot of benefit from implementing the internal model in haskell. Pandoc has a much harder problem because there's a lot more variability and less structure in the formats, so it should be doable for formats.

arandomhumanop3 months ago

Your comment regarding a great project having a test suite for transcoding is very apt and something thought about a lot when putting this together. qq currently has a test that is vaguely akin to that but it's embarrassingly crude[0] at this stage. If something like this had some assertions after converting for all (or applicable) combinations of formats with a given query I think that might be approaching a good testing suite. (Forgive my ignorance I haven't looked in depth at the testing for jq)

Now that you mention it, it would be nice to support a multi output mode in the cli as well. Using the codec library should achieve a similar M+N complexity if I am not mistaken.

[0] https://github.com/JFryy/qq/actions/runs/9654023860/job/2662...

mutant3 months ago

Honest question, yq pretty much does this, are you solving for something yq doesn't?

arandomhumanop3 months ago

Understandable it's good to ask this bluntly as it's not very clear by title alone.

qq currently supports hcl/tf, csv, ini, json, toml, xml and yaml. On the roadmap there are more to come soon such as protobuff, html. as it currently stands yq and qq don't support all the same formats as one another. A few examples:

- qq supports hcl but yq does not.

- yq supports lua, qq does not.

- qq and yq have different quirks in marshalling and unmarshalling data. For instance yq only supports toml output with scalars and on the other hand there's a lot of room for refinement with qq's encodings as of the moment.

qq's goal is to have a very comprehensive set of encodings many of which are yet to be included or represented in a single tool (qq will not cover most binary formats however). It also has a interactive query mode with auto-complete which is pretty handy on top of having a unique and growing set of formats supported.

ec1096853 months ago

As an aside, chatgpt is very good at generating jq queries from an example input and text version of the query.

ruuda3 months ago

Or you could write the query in a language that is easy to write by hand, like https://rcl-lang.org/, so you don't need ChatGPT to translate for you.

arandomhumanop3 months ago

This is still under a lot of change, expect more formats to be supported along with some general improvements to interactive mode and autocomplete.

robertlagrant3 months ago

Sounds good! It might be worth making your format support table have more granularity to show levels of maturity/implementation for each format the tool supports.

philsnow3 months ago

I would also be interested in some more detail on this bullet:

  - *qq* is not a full *jq/\*q* replacement and comes with idiosyncrasies from the underlying gojq library
what is qq missing that jq has? Is that the same question as "what is gojq missing that jq has"?

wwader3 months ago

Note that gojq also fixes and improves on some things compared to jq, like arbitrary-precision arithmetic (jq only preserves), some semantic and parsing improvement and fixes that will likely happen in jq at some point also, time zone/formatting/parsing fixes, etc

kitd3 months ago

From [1], there are a few differences that arise from gojq ignoring object key sort order (which you shouldn't really rely on anyway).

[1] https://github.com/itchyny/gojq#difference-to-jq

arandomhumanop3 months ago

That’s it pretty much the only thing of note I’ve encountered. But yes I just mean it may not be a perfect drop in replace in all existing scripts at this time for jq like tools for other formats since they will likely take a lot of care in having the encoder for their designated format working well. qq still has a few bugs to iron out and more testing to do.

charlesdaniels3 months ago

I have been working on a project in a similar vein: rq[0]. Mine started out as an attempt to make a jq-like frontend for the Rego[1] language. However, I do find myself using it to simply convert from one format to another, and for pretty printing, quite often.

The interactive mode that qq has is really slick. I didn't torture test it, but it worked pretty smoothly with a quick test.

I see that the XML support is implemented using the same library as rq. This definitely has some problems, because the data model of XML does not map very cleanly onto JSON. In particular, I recall that I had problems using mxj to marshal arrays, and had to use this[2] ugly hack instead. qq seems to have done this a little more cleanly[3]. I may just have to lift this particular trick.

I definitely found CSV output to be challenging. Converting arbitrary JSON style data into something table shaped is pretty tricky. This[4] is my attempt. I have found it works well enough, though it isn't perfect.

I can also see that qq hasn't yet run into the byte order marker problem with CSV inputs yet. Might want to check out spkg/bom[5].

One final breadcrumb I'll drop - I drew a lot of inspiration for rq's input parsers from conftest[6]. That may be a good resource to find more formats and see specific usage examples of them.

Thanks for sharing! It's really interesting to see some of the convergent evolution between rq and qq.

0 - https://git.sr.ht/~charles/rq

1 - https://www.openpolicyagent.org/docs/latest/policy-language/

2 - https://git.sr.ht/~charles/rq/tree/c67df633c0438763956ff8646...

3 - https://github.com/JFryy/qq/blob/2f750f04def47bec9be100b7c89...

4 - https://git.sr.ht/~charles/rq/tree/c67df633c0438763956ff8646...

5 - github.com/spkg/bom

6 - https://github.com/open-policy-agent/conftest

arandomhumanop3 months ago

Hi charles,

rq was shared with me yesterday and just wanted to say it's very impressive, I had heard of OPA/Gatekeeper and have looked into Rego before for policy assertions w/ terraform but I was not aware the language was so expressive until I saw rq. Also the amount of codecs rq supports and quality of them is really great.

It is really neat seeing a lot tools solve a similar problem in such unique ways (especially the case with rq) and has been a lot of fun reading your experiences here. Thanks for sharing your experiences and expertise with the de-serializing/serializing content - It is really cathartic to hear you mention the challenges you solved with xml and csv. I really like how you solved for CSV output/input and the conditions on the input data you chose for evaluating it makes a lot of sense and is really comprehensive, it bothered me too since the content would either need to be a matrix or a slice of maps but seeing as jq has string formatting that can convert things to csv and @tsv - I was at a bit of a standstill of how to approach.

Thanks so much for the bread crumbs I look forward to reading this in more detail over the week/weekend :)

charlesdaniels3 months ago

Wow, didn’t realize it had enough legs for people to be hearing about it except via me! Awesome to hear that.

Rego is “for” those authz cases like the ones you mentioned in the sense that it’s definitely designed with those in mind, and I do think it does a good job for those needs. OPA itself is definitely geared for use as a microservice or container sidecar, talking over the wire. That’s kinda hard to use in a shell script though.

Once I learned it I found myself using opa eval for searching and transforming data, eventually so much so that I made a shell script called “rq” that was basically opa eval -I -f pretty… the rest is history.

wwader3 months ago

Hey! fq author here, just want to say that i've looked at rq also :) nice to see ppl explore developing tools like this

arandomhumanop3 months ago

Wow this is cool seeing you post here, fq might be most innovative modern cli tool I’ve seen. The historic archival and querying formats fq is providing is a big inspiration.

wwader3 months ago

Thanks for the kind words! I have to pass along most of the credit to the designers of jq and jq CLI interface. Lots of care and thought have gone into those, also gojq was big enabler. I feel more like plumber that half accidentally connected a bitstream decoder with a hacked up version of jq :D

arandomhumanop3 months ago

If fq is a primarily a plumbing project, qq is a janitorial one at this stage (Maybe qq can be more akin to plumbing in the future). All things considered really appreciate your work with the community!

wwader3 months ago

It do involve some quite complicated plumbing :) i hope i've made at least some more ppl interested in jq as a langauge and related tools with fq, jqjq, jq-lsp and by helping maintain the original jq project

arandomhumanop3 months ago

I mean to be fair the support for semi structured formats is unprecedented with rq so I could see why!

Thanks for sharing about rq here, I really should give OPA a try sometime, it seems really powerful setting policies in kubernetes or terraform for instance. When I first heard of Rego, I was very interested but it didn’t quite click, I can see that would not have been the case had rq been available at the time.

iimblack3 months ago

This is the beauty of open source. I love that you’re being so collaborative here instead of seeing another tool as competition.

charlesdaniels3 months ago

What's better than 1 nifty tool for querying semistructured data?

2 nifty tools for querying semistructured data!

hn-front (c) 2024 voximity
source