JSON reader/interpreter/converter

OldStrummer · 23 December 2025 14:36

I posted this here in because I don't really think this is a help topic, but merely looking for suggestions and/or recommendations.

Using czkawka (a wonderful tool for locating duplicate files) I wound up with a large set of data that when saved results in a JSON file. I know that programming tools like geany can read and "beautify" JSON files, but I'm hoping to extract simply the full path of each file and write it out to another (hopefully text) file, so I can then review the duplicates and decide on a pruning method.

I'm thinking the simplest, although not necessrily the easiest method, is to use sed to write out the line containing the "path" specification to a text file, but I wonder if there are automated tools that can do this?

I am not a javascript programmer, so native JSON doesn't do much for me. Reading a large (5.7Mb) file line by line just seems inefficient and time-consuming.

Any suggestions?

P.S. I know I can do this with both grep and sed because I've tried both, but is there an easier way?

ugnvs · 23 December 2025 19:16

May be czkawka GUI is more convenient than .json for your purposes

github.com/qarmin/czkawka

czkawka_gui/README.md

master

# Czkawka GUI

Czkawka GUI is a graphical user interface for Czkawka Core, built with GTK 4.

![Screenshot from 2023-11-26 12-43-32](https://github.com/qarmin/czkawka/assets/41945903/722ed490-0be1-4dac-bcfc-182a4d0787dc)

## Maintenance Mode

Czkawka GTK is currently in maintenance mode.  
No new features will be added (at least by me), but bug fixes and compatibility updates with the Czkawka core package will continue.  
Active development is now focused on the Krokiet GUI.

## Requirements

Requirements depend on your platform.

Prebuilt binaries are available here: https://github.com/qarmin/czkawka/releases/

Additional features such as HEIF, libraw, and libavif require extra libraries to be installed, which may increase the number of dependencies.

This file has been truncated. show original

OldStrummer · 23 December 2025 19:47

LOL! We now come full circle. As I mentioned in my OP, I used czkawka to discover my duplicates.

What I did was save the output which in "pretty" JSON format is 5.7MB in size. That's 28,786 lines! But I don't need the JSON formatting, all I need is the path-file information.

I have actually accomplished my goal by using sed (grep is faster, but sed lets me use a single command). All I care about are image (.jpg) files, so I used

sed -n '/path/ { /jpg/ p }' input_file > output_file

ericmarceau · 23 December 2025 21:53

If you are interested, the following is a tool which can assist, depending on how much of the "obfuscated" output you want to be able to read, interpret, parse or process.

Script "USER__CODE__ReFormatPretty.sh":

#!/bin/bash

###	Hack for attempt at de-obfuscating javascript, css, and similarly formatted files

###	Intended to insert newline at evey semicolon and opening and closing brace
###	Open brace triggers indentation action until closing brace reduces indentation amount

prettify()
{
	awk '{
		gsub(/[;]/,";\n") ;
		gsub(/[{]/,"{\n") ;
		gsub(/[}]/,"\n}\n") ;
		gsub(/[[<][Aa]/,"\n<a") ;
		gsub(/[[<][Dd][Ii][Vv]/,"\n<div") ;
		gsub(/[[<][Uu][Ll]/,"\n<ul") ;
		gsub(/[[<][Ll][Ii]/,"\n<li") ;
		gsub(/[[<][Ss][Pp][Aa][Nn]/,"\n<span") ;
		gsub(/[[<][/][Aa]/,"\n</a") ;
		gsub(/[[<][/][Uu][Ll]/,"\n</ul") ;
		gsub(/[[<][/][Ll][Ii]/,"\n</li") ;
		gsub(/[[<][/][Dd][Ii][Vv]/,"\n</div") ;
		gsub(/[[<][/][Ss][Pp][Aa][Nn]/,"\n</span") ;
		print $0 ;
	}' |
	awk -v dbg=${debug} 'BEGIN{
		indent=0 ;
	}{
		if( $1 == "}" ){
			indent-- ;
			if( indent == -1 ){
				indent++ ;
				if( dbg ==1 ){ print indent, NR | "cat 1>&2" ; } ;
			}else{
				if( dbg ==1 ){ print indent | "cat 1>&2" ; } ;
			} ;
		} ;

		if( indent == 0 ){
			print $0 ;
		}else{
			for( i=1 ; i <= indent ; i++ ){
				printf("\t") ;
			} ;
			print $0 ;
		} ;

		if( index($0,"{") > 0 ){
			indent++ ;
			if( dbg == 1 ){ print indent | "cat 1>&2" ; } ;
		} ;
	}'
}

PIPE=0

while [ $# -gt 0 ]
do
	case $1 in
		--stream ) PIPE=1 ; shift ;;
		--file )   PIPE=0 ; INPUT="$2" ; shift ; shift ;;
		* ) printf "\n\t Invalid option provided on the command line.  Only available: [ --stream | --file {filename} ]\n\n" ; exit 1 ;;
	esac
done

if [ ${PIPE} -eq 1 ]
then
	prettify
else
	prettify < ${INPUT}
fi

exit

OldStrummer · 23 December 2025 22:43

LOL!

That reminds me of the winning entry in a 1988 International Obfuscated C Code Contest, authored by Ian Phillipps.

main(t,_,a) char *a; { return!0<t?t<3?main(-79,-13,a+main(-87,1-_,main(-86,0,a+1)+a)): 1,t<_?main(t+1,_,a):3,main(-94,-27+t,a)&&t==2?_<13? main(2,_+1,"%s %d %d\\n"):9:16:t<0?t<-72?main(_,t, "@n\'+,#\'/*{}w+/w#cdnr/+,{}r/*de}+,/*{*+,/w{%+,/w#q#n+,/#{l+,/n{n+,/+#n+,/#\\n;#q#n+,/+k#;*+,/\'r :\'d*\'3,}{w+K w\'K:\'+}e#\';dq#\'l \\q#\'+d\'K#!/+k#;q#\'r}eKK#}w\'r}eKK{nl]\'/#;#q#n\'){)#}w\'){){nl]\'/+#n\';d}rw\' i;# \$${nl]!/n{n#\'; r{#w\'r nc{nl]\'/#{l,+\'K {rw\' iK{;[{nl]\'/w#q#n\'wk nw\' \\iwk{KK{nl]!/w{%\'l##w#\' i; :{nl]\'/*{q#\'ld;r\'}{nlwb!/*de}\'c \\;;{nl\'-{}rw]\'/+,}##\'*}#nc,\',#nw]\'/+kd\'+e}+;#\'rdq#w! nr\'/ \') }+}{rl#\'{n\' \')# \\}\'+}##(!!/") :t<-50?_==*a?putchar(31[a]):main(-65,_,a+1):main((*a==\'/\')+t,_,a+1) :0<t?main(2,2,"%s"):*a==\'/\'||main(0,main(-61,*a, "!ek;dc i@bK\'(q)-[w]*%n+r3#l,{}:\\nuwloca-O;m .vpbks,fxntdCeghiry"),a+1); }
Compile and run it .

Or not. C compilers and the code have evolved over the past 35 years, so I'll tell you what it did: It printed out the full "12 Days of Christmas" song.

Merry Christmas!

(P.S. You can find out more, including the formatted code, here)