how to read a nul-terminated-string from a binary file

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite
1












I have a binary file that is filled with FF values. I filled its start with many 00. Then, I padded its start with 10 00, in order to get some kind of offset, and then I wrote a shorter string, also terminated with 00



I used this printf:



printf 00000000000000000000MAC_ADDRESS=12:34:56:78:90,PCB_MAIN_ID=m/SF-1V/MAIN/0.0,PCB_PIGGY1_ID=n/SF-1V/PS/0.0,CSL_HW_VARIANT=D00' > eeprom



This is how it looks like when I show the hexdump of the file



enter image description here



Now what I want to know is how can I read the string. I can use MY_STR=$eeprom:OFFSET (eeprom is the file name) which will give me the string, but will also give me the rest of the file which I don't want. How can I stop it when it first encounters 00?



  • Can't use MY_STR=$eeprom:OFFSET:LENGTH because string's length is unknown

  • Another thing - how can I fill it up again with FF?

  • Using sh (busybox)

EDIT



Trying to do some minor example of this...
I have one file input with this values (after xxd -c 1 input):



0000000: 68 h
0000001: 65 e
0000002: 6c l
0000003: 6c l
0000004: 6f o
0000005: 2c ,
0000006: 20
0000007: 00 .
0000008: 69 i
0000009: 74 t
000000a: 27 '
000000b: 73 s
000000c: 20
000000d: 6d m
000000e: 65 e
000000f: 2c ,
0000010: 00 .


and I have this script s.sh:



BUF=""
for c in $(xxd -p input); do
if [ "$c" != 00 ]; then
BUF="$BUFc";
else
break;
fi
done

echo $BUF


and I expected it to echo "hello", however, nothing is printed







share|improve this question





















  • busybox's sh doesn't sound like something that would work well with NUL-separated data. Zsh or even Bash would better. Or some actual programming language, like Perl. Are there any other tools than busybox you can use?
    – ilkkachu
    May 16 at 13:51










  • strings will extract null terminated strings from binary files
    – ajeh
    May 16 at 14:14














up vote
1
down vote

favorite
1












I have a binary file that is filled with FF values. I filled its start with many 00. Then, I padded its start with 10 00, in order to get some kind of offset, and then I wrote a shorter string, also terminated with 00



I used this printf:



printf 00000000000000000000MAC_ADDRESS=12:34:56:78:90,PCB_MAIN_ID=m/SF-1V/MAIN/0.0,PCB_PIGGY1_ID=n/SF-1V/PS/0.0,CSL_HW_VARIANT=D00' > eeprom



This is how it looks like when I show the hexdump of the file



enter image description here



Now what I want to know is how can I read the string. I can use MY_STR=$eeprom:OFFSET (eeprom is the file name) which will give me the string, but will also give me the rest of the file which I don't want. How can I stop it when it first encounters 00?



  • Can't use MY_STR=$eeprom:OFFSET:LENGTH because string's length is unknown

  • Another thing - how can I fill it up again with FF?

  • Using sh (busybox)

EDIT



Trying to do some minor example of this...
I have one file input with this values (after xxd -c 1 input):



0000000: 68 h
0000001: 65 e
0000002: 6c l
0000003: 6c l
0000004: 6f o
0000005: 2c ,
0000006: 20
0000007: 00 .
0000008: 69 i
0000009: 74 t
000000a: 27 '
000000b: 73 s
000000c: 20
000000d: 6d m
000000e: 65 e
000000f: 2c ,
0000010: 00 .


and I have this script s.sh:



BUF=""
for c in $(xxd -p input); do
if [ "$c" != 00 ]; then
BUF="$BUFc";
else
break;
fi
done

echo $BUF


and I expected it to echo "hello", however, nothing is printed







share|improve this question





















  • busybox's sh doesn't sound like something that would work well with NUL-separated data. Zsh or even Bash would better. Or some actual programming language, like Perl. Are there any other tools than busybox you can use?
    – ilkkachu
    May 16 at 13:51










  • strings will extract null terminated strings from binary files
    – ajeh
    May 16 at 14:14












up vote
1
down vote

favorite
1









up vote
1
down vote

favorite
1






1





I have a binary file that is filled with FF values. I filled its start with many 00. Then, I padded its start with 10 00, in order to get some kind of offset, and then I wrote a shorter string, also terminated with 00



I used this printf:



printf 00000000000000000000MAC_ADDRESS=12:34:56:78:90,PCB_MAIN_ID=m/SF-1V/MAIN/0.0,PCB_PIGGY1_ID=n/SF-1V/PS/0.0,CSL_HW_VARIANT=D00' > eeprom



This is how it looks like when I show the hexdump of the file



enter image description here



Now what I want to know is how can I read the string. I can use MY_STR=$eeprom:OFFSET (eeprom is the file name) which will give me the string, but will also give me the rest of the file which I don't want. How can I stop it when it first encounters 00?



  • Can't use MY_STR=$eeprom:OFFSET:LENGTH because string's length is unknown

  • Another thing - how can I fill it up again with FF?

  • Using sh (busybox)

EDIT



Trying to do some minor example of this...
I have one file input with this values (after xxd -c 1 input):



0000000: 68 h
0000001: 65 e
0000002: 6c l
0000003: 6c l
0000004: 6f o
0000005: 2c ,
0000006: 20
0000007: 00 .
0000008: 69 i
0000009: 74 t
000000a: 27 '
000000b: 73 s
000000c: 20
000000d: 6d m
000000e: 65 e
000000f: 2c ,
0000010: 00 .


and I have this script s.sh:



BUF=""
for c in $(xxd -p input); do
if [ "$c" != 00 ]; then
BUF="$BUFc";
else
break;
fi
done

echo $BUF


and I expected it to echo "hello", however, nothing is printed







share|improve this question













I have a binary file that is filled with FF values. I filled its start with many 00. Then, I padded its start with 10 00, in order to get some kind of offset, and then I wrote a shorter string, also terminated with 00



I used this printf:



printf 00000000000000000000MAC_ADDRESS=12:34:56:78:90,PCB_MAIN_ID=m/SF-1V/MAIN/0.0,PCB_PIGGY1_ID=n/SF-1V/PS/0.0,CSL_HW_VARIANT=D00' > eeprom



This is how it looks like when I show the hexdump of the file



enter image description here



Now what I want to know is how can I read the string. I can use MY_STR=$eeprom:OFFSET (eeprom is the file name) which will give me the string, but will also give me the rest of the file which I don't want. How can I stop it when it first encounters 00?



  • Can't use MY_STR=$eeprom:OFFSET:LENGTH because string's length is unknown

  • Another thing - how can I fill it up again with FF?

  • Using sh (busybox)

EDIT



Trying to do some minor example of this...
I have one file input with this values (after xxd -c 1 input):



0000000: 68 h
0000001: 65 e
0000002: 6c l
0000003: 6c l
0000004: 6f o
0000005: 2c ,
0000006: 20
0000007: 00 .
0000008: 69 i
0000009: 74 t
000000a: 27 '
000000b: 73 s
000000c: 20
000000d: 6d m
000000e: 65 e
000000f: 2c ,
0000010: 00 .


and I have this script s.sh:



BUF=""
for c in $(xxd -p input); do
if [ "$c" != 00 ]; then
BUF="$BUFc";
else
break;
fi
done

echo $BUF


and I expected it to echo "hello", however, nothing is printed









share|improve this question












share|improve this question




share|improve this question








edited May 16 at 16:38
























asked May 16 at 13:40









CIsForCookies

1116




1116











  • busybox's sh doesn't sound like something that would work well with NUL-separated data. Zsh or even Bash would better. Or some actual programming language, like Perl. Are there any other tools than busybox you can use?
    – ilkkachu
    May 16 at 13:51










  • strings will extract null terminated strings from binary files
    – ajeh
    May 16 at 14:14
















  • busybox's sh doesn't sound like something that would work well with NUL-separated data. Zsh or even Bash would better. Or some actual programming language, like Perl. Are there any other tools than busybox you can use?
    – ilkkachu
    May 16 at 13:51










  • strings will extract null terminated strings from binary files
    – ajeh
    May 16 at 14:14















busybox's sh doesn't sound like something that would work well with NUL-separated data. Zsh or even Bash would better. Or some actual programming language, like Perl. Are there any other tools than busybox you can use?
– ilkkachu
May 16 at 13:51




busybox's sh doesn't sound like something that would work well with NUL-separated data. Zsh or even Bash would better. Or some actual programming language, like Perl. Are there any other tools than busybox you can use?
– ilkkachu
May 16 at 13:51












strings will extract null terminated strings from binary files
– ajeh
May 16 at 14:14




strings will extract null terminated strings from binary files
– ajeh
May 16 at 14:14










1 Answer
1






active

oldest

votes

















up vote
1
down vote



accepted










Solution 1: Direct Variable Assignment



If all you're worried about are the null bytes then you should just be able to directly read the data from the file into a variable using whichever standard method you prefer, i.e. you should be able to just ignore the null bytes and read the data from the file. Here's an example using the cat command and command substitution:



$ data="$(cat eeprom)"
$ echo "$data"
MAC_ADDRESS=12:34:56:78:90,PCB_MAIN_ID=m/SF-1V/MAIN/0.0,PCB_PIGGY1_ID=n/SF-1V/PS/0.0,CSL_HW_VARIANT=D


This worked for me inside of a BusyBox Docker container.



Solution 2: Using xxd and a for loop



If you want more control than you can use xxd to convert the bytes to hexadecimal strings and iterate over these strings. Then, while iterating over these string, you can apply whatever logic you'd like, e.g. you could explicitly skip over the initial null values and print the rest of the data until you reach some break condition.



Here's a script that specifies a "white-list" of valid characters (ASCII 32 through 127), treats any subsequence of other characters as a separator, and extracts all valid substrings:



#!/bin/sh
# get_hex_substrings.sh

# Get the path to the data-file as a command-line argument
datafile="$1"

# Keep track of state using environment variables
inside_padding_block="true"
inside_bad_block="false"

# NOTE: The '-p' flag is for "plain" output (no additional formatting)
# and the '-c 1' option specifies that the representation of each byte
# will be printed on a separate line
for h in $(xxd -p -c 1 "$datafile"); do

# Convert the hex character to standard decimal
d="$((0x$h))"

# Case where we're still inside the initial padding block
if [ "$inside_padding_block" == "true" ]; then
if [ "$d" -ge 32 ] && [ "$d" -le 127 ]; then
inside_padding_block="false";
printf 'x'"$h";
fi

# Case where we're passed the initial padding, but inside another
# block of non-printable characters
elif [ "$inside_bad_block" == "true" ]; then
if [ "$d" -ge 32 ] && [ "$d" -le 127 ]; then
inside_bad_block="false";
printf 'x'"$h";
fi

# Case where we're inside of a substring that we want to extract
else
if [ "$d" -ge 32 ] && [ "$d" -le 127 ]; then
printf 'x'"$h";
else
inside_bad_block="true";
echo
fi
fi
done

if [ "$inside_bad_block" == "false" ]; then
echo
fi


Now we can test this out by creating an example file which has both x00 and xff subsequences separating substrings:



printf 'x00x00x00string1xffxffxffstring2x00x00x00string3x00x00x00' > data.hex


And here's the output we get when running the script:



$ sh get_hex_substrings.sh data.hex
string1
string2
string3


Solution 3: Using the tr and cut commands



You could also try using the tr and cut commands to deal with the null bytes. Here's an example of extracting the first null-terminated string from a list of null-terminated strings by squeezing/collapsing adjacent null-characters and converting them to newlines:



$ printf '000000string1000000string2000000string3000000' > file.dat
$ tr -s '00' 'n' < file.dat | cut -d$'n' -f2
string1





share|improve this answer























  • How would it handle the FF bytes? and what if there will be other garbage values after the null byte that ends the string?
    – CIsForCookies
    May 16 at 14:35






  • 1




    @CIsForCookies The first method (dat="$(cat eeprom)" probably won't work for that case, but the loop obviously will - you can include whatever logic you want in there. Maybe update your question to include a more complete input example along with the desired output?
    – igal
    May 16 at 15:15










  • Can you please explain the xxd parameters? can't understand why -c is needed (the others I'm not sure about)
    – CIsForCookies
    May 16 at 16:08






  • 1




    @CIsForCookies The -c parameter sets the number of columns used when outputting. Using -c 1 should mean that each byte is printed on its own line. I don't think it's necessary, strictly speaking - just a personal choice. Anyway, I'll update my post.
    – igal
    May 16 at 16:23










Your Answer







StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f444162%2fhow-to-read-a-nul-terminated-string-from-a-binary-file%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
1
down vote



accepted










Solution 1: Direct Variable Assignment



If all you're worried about are the null bytes then you should just be able to directly read the data from the file into a variable using whichever standard method you prefer, i.e. you should be able to just ignore the null bytes and read the data from the file. Here's an example using the cat command and command substitution:



$ data="$(cat eeprom)"
$ echo "$data"
MAC_ADDRESS=12:34:56:78:90,PCB_MAIN_ID=m/SF-1V/MAIN/0.0,PCB_PIGGY1_ID=n/SF-1V/PS/0.0,CSL_HW_VARIANT=D


This worked for me inside of a BusyBox Docker container.



Solution 2: Using xxd and a for loop



If you want more control than you can use xxd to convert the bytes to hexadecimal strings and iterate over these strings. Then, while iterating over these string, you can apply whatever logic you'd like, e.g. you could explicitly skip over the initial null values and print the rest of the data until you reach some break condition.



Here's a script that specifies a "white-list" of valid characters (ASCII 32 through 127), treats any subsequence of other characters as a separator, and extracts all valid substrings:



#!/bin/sh
# get_hex_substrings.sh

# Get the path to the data-file as a command-line argument
datafile="$1"

# Keep track of state using environment variables
inside_padding_block="true"
inside_bad_block="false"

# NOTE: The '-p' flag is for "plain" output (no additional formatting)
# and the '-c 1' option specifies that the representation of each byte
# will be printed on a separate line
for h in $(xxd -p -c 1 "$datafile"); do

# Convert the hex character to standard decimal
d="$((0x$h))"

# Case where we're still inside the initial padding block
if [ "$inside_padding_block" == "true" ]; then
if [ "$d" -ge 32 ] && [ "$d" -le 127 ]; then
inside_padding_block="false";
printf 'x'"$h";
fi

# Case where we're passed the initial padding, but inside another
# block of non-printable characters
elif [ "$inside_bad_block" == "true" ]; then
if [ "$d" -ge 32 ] && [ "$d" -le 127 ]; then
inside_bad_block="false";
printf 'x'"$h";
fi

# Case where we're inside of a substring that we want to extract
else
if [ "$d" -ge 32 ] && [ "$d" -le 127 ]; then
printf 'x'"$h";
else
inside_bad_block="true";
echo
fi
fi
done

if [ "$inside_bad_block" == "false" ]; then
echo
fi


Now we can test this out by creating an example file which has both x00 and xff subsequences separating substrings:



printf 'x00x00x00string1xffxffxffstring2x00x00x00string3x00x00x00' > data.hex


And here's the output we get when running the script:



$ sh get_hex_substrings.sh data.hex
string1
string2
string3


Solution 3: Using the tr and cut commands



You could also try using the tr and cut commands to deal with the null bytes. Here's an example of extracting the first null-terminated string from a list of null-terminated strings by squeezing/collapsing adjacent null-characters and converting them to newlines:



$ printf '000000string1000000string2000000string3000000' > file.dat
$ tr -s '00' 'n' < file.dat | cut -d$'n' -f2
string1





share|improve this answer























  • How would it handle the FF bytes? and what if there will be other garbage values after the null byte that ends the string?
    – CIsForCookies
    May 16 at 14:35






  • 1




    @CIsForCookies The first method (dat="$(cat eeprom)" probably won't work for that case, but the loop obviously will - you can include whatever logic you want in there. Maybe update your question to include a more complete input example along with the desired output?
    – igal
    May 16 at 15:15










  • Can you please explain the xxd parameters? can't understand why -c is needed (the others I'm not sure about)
    – CIsForCookies
    May 16 at 16:08






  • 1




    @CIsForCookies The -c parameter sets the number of columns used when outputting. Using -c 1 should mean that each byte is printed on its own line. I don't think it's necessary, strictly speaking - just a personal choice. Anyway, I'll update my post.
    – igal
    May 16 at 16:23














up vote
1
down vote



accepted










Solution 1: Direct Variable Assignment



If all you're worried about are the null bytes then you should just be able to directly read the data from the file into a variable using whichever standard method you prefer, i.e. you should be able to just ignore the null bytes and read the data from the file. Here's an example using the cat command and command substitution:



$ data="$(cat eeprom)"
$ echo "$data"
MAC_ADDRESS=12:34:56:78:90,PCB_MAIN_ID=m/SF-1V/MAIN/0.0,PCB_PIGGY1_ID=n/SF-1V/PS/0.0,CSL_HW_VARIANT=D


This worked for me inside of a BusyBox Docker container.



Solution 2: Using xxd and a for loop



If you want more control than you can use xxd to convert the bytes to hexadecimal strings and iterate over these strings. Then, while iterating over these string, you can apply whatever logic you'd like, e.g. you could explicitly skip over the initial null values and print the rest of the data until you reach some break condition.



Here's a script that specifies a "white-list" of valid characters (ASCII 32 through 127), treats any subsequence of other characters as a separator, and extracts all valid substrings:



#!/bin/sh
# get_hex_substrings.sh

# Get the path to the data-file as a command-line argument
datafile="$1"

# Keep track of state using environment variables
inside_padding_block="true"
inside_bad_block="false"

# NOTE: The '-p' flag is for "plain" output (no additional formatting)
# and the '-c 1' option specifies that the representation of each byte
# will be printed on a separate line
for h in $(xxd -p -c 1 "$datafile"); do

# Convert the hex character to standard decimal
d="$((0x$h))"

# Case where we're still inside the initial padding block
if [ "$inside_padding_block" == "true" ]; then
if [ "$d" -ge 32 ] && [ "$d" -le 127 ]; then
inside_padding_block="false";
printf 'x'"$h";
fi

# Case where we're passed the initial padding, but inside another
# block of non-printable characters
elif [ "$inside_bad_block" == "true" ]; then
if [ "$d" -ge 32 ] && [ "$d" -le 127 ]; then
inside_bad_block="false";
printf 'x'"$h";
fi

# Case where we're inside of a substring that we want to extract
else
if [ "$d" -ge 32 ] && [ "$d" -le 127 ]; then
printf 'x'"$h";
else
inside_bad_block="true";
echo
fi
fi
done

if [ "$inside_bad_block" == "false" ]; then
echo
fi


Now we can test this out by creating an example file which has both x00 and xff subsequences separating substrings:



printf 'x00x00x00string1xffxffxffstring2x00x00x00string3x00x00x00' > data.hex


And here's the output we get when running the script:



$ sh get_hex_substrings.sh data.hex
string1
string2
string3


Solution 3: Using the tr and cut commands



You could also try using the tr and cut commands to deal with the null bytes. Here's an example of extracting the first null-terminated string from a list of null-terminated strings by squeezing/collapsing adjacent null-characters and converting them to newlines:



$ printf '000000string1000000string2000000string3000000' > file.dat
$ tr -s '00' 'n' < file.dat | cut -d$'n' -f2
string1





share|improve this answer























  • How would it handle the FF bytes? and what if there will be other garbage values after the null byte that ends the string?
    – CIsForCookies
    May 16 at 14:35






  • 1




    @CIsForCookies The first method (dat="$(cat eeprom)" probably won't work for that case, but the loop obviously will - you can include whatever logic you want in there. Maybe update your question to include a more complete input example along with the desired output?
    – igal
    May 16 at 15:15










  • Can you please explain the xxd parameters? can't understand why -c is needed (the others I'm not sure about)
    – CIsForCookies
    May 16 at 16:08






  • 1




    @CIsForCookies The -c parameter sets the number of columns used when outputting. Using -c 1 should mean that each byte is printed on its own line. I don't think it's necessary, strictly speaking - just a personal choice. Anyway, I'll update my post.
    – igal
    May 16 at 16:23












up vote
1
down vote



accepted







up vote
1
down vote



accepted






Solution 1: Direct Variable Assignment



If all you're worried about are the null bytes then you should just be able to directly read the data from the file into a variable using whichever standard method you prefer, i.e. you should be able to just ignore the null bytes and read the data from the file. Here's an example using the cat command and command substitution:



$ data="$(cat eeprom)"
$ echo "$data"
MAC_ADDRESS=12:34:56:78:90,PCB_MAIN_ID=m/SF-1V/MAIN/0.0,PCB_PIGGY1_ID=n/SF-1V/PS/0.0,CSL_HW_VARIANT=D


This worked for me inside of a BusyBox Docker container.



Solution 2: Using xxd and a for loop



If you want more control than you can use xxd to convert the bytes to hexadecimal strings and iterate over these strings. Then, while iterating over these string, you can apply whatever logic you'd like, e.g. you could explicitly skip over the initial null values and print the rest of the data until you reach some break condition.



Here's a script that specifies a "white-list" of valid characters (ASCII 32 through 127), treats any subsequence of other characters as a separator, and extracts all valid substrings:



#!/bin/sh
# get_hex_substrings.sh

# Get the path to the data-file as a command-line argument
datafile="$1"

# Keep track of state using environment variables
inside_padding_block="true"
inside_bad_block="false"

# NOTE: The '-p' flag is for "plain" output (no additional formatting)
# and the '-c 1' option specifies that the representation of each byte
# will be printed on a separate line
for h in $(xxd -p -c 1 "$datafile"); do

# Convert the hex character to standard decimal
d="$((0x$h))"

# Case where we're still inside the initial padding block
if [ "$inside_padding_block" == "true" ]; then
if [ "$d" -ge 32 ] && [ "$d" -le 127 ]; then
inside_padding_block="false";
printf 'x'"$h";
fi

# Case where we're passed the initial padding, but inside another
# block of non-printable characters
elif [ "$inside_bad_block" == "true" ]; then
if [ "$d" -ge 32 ] && [ "$d" -le 127 ]; then
inside_bad_block="false";
printf 'x'"$h";
fi

# Case where we're inside of a substring that we want to extract
else
if [ "$d" -ge 32 ] && [ "$d" -le 127 ]; then
printf 'x'"$h";
else
inside_bad_block="true";
echo
fi
fi
done

if [ "$inside_bad_block" == "false" ]; then
echo
fi


Now we can test this out by creating an example file which has both x00 and xff subsequences separating substrings:



printf 'x00x00x00string1xffxffxffstring2x00x00x00string3x00x00x00' > data.hex


And here's the output we get when running the script:



$ sh get_hex_substrings.sh data.hex
string1
string2
string3


Solution 3: Using the tr and cut commands



You could also try using the tr and cut commands to deal with the null bytes. Here's an example of extracting the first null-terminated string from a list of null-terminated strings by squeezing/collapsing adjacent null-characters and converting them to newlines:



$ printf '000000string1000000string2000000string3000000' > file.dat
$ tr -s '00' 'n' < file.dat | cut -d$'n' -f2
string1





share|improve this answer















Solution 1: Direct Variable Assignment



If all you're worried about are the null bytes then you should just be able to directly read the data from the file into a variable using whichever standard method you prefer, i.e. you should be able to just ignore the null bytes and read the data from the file. Here's an example using the cat command and command substitution:



$ data="$(cat eeprom)"
$ echo "$data"
MAC_ADDRESS=12:34:56:78:90,PCB_MAIN_ID=m/SF-1V/MAIN/0.0,PCB_PIGGY1_ID=n/SF-1V/PS/0.0,CSL_HW_VARIANT=D


This worked for me inside of a BusyBox Docker container.



Solution 2: Using xxd and a for loop



If you want more control than you can use xxd to convert the bytes to hexadecimal strings and iterate over these strings. Then, while iterating over these string, you can apply whatever logic you'd like, e.g. you could explicitly skip over the initial null values and print the rest of the data until you reach some break condition.



Here's a script that specifies a "white-list" of valid characters (ASCII 32 through 127), treats any subsequence of other characters as a separator, and extracts all valid substrings:



#!/bin/sh
# get_hex_substrings.sh

# Get the path to the data-file as a command-line argument
datafile="$1"

# Keep track of state using environment variables
inside_padding_block="true"
inside_bad_block="false"

# NOTE: The '-p' flag is for "plain" output (no additional formatting)
# and the '-c 1' option specifies that the representation of each byte
# will be printed on a separate line
for h in $(xxd -p -c 1 "$datafile"); do

# Convert the hex character to standard decimal
d="$((0x$h))"

# Case where we're still inside the initial padding block
if [ "$inside_padding_block" == "true" ]; then
if [ "$d" -ge 32 ] && [ "$d" -le 127 ]; then
inside_padding_block="false";
printf 'x'"$h";
fi

# Case where we're passed the initial padding, but inside another
# block of non-printable characters
elif [ "$inside_bad_block" == "true" ]; then
if [ "$d" -ge 32 ] && [ "$d" -le 127 ]; then
inside_bad_block="false";
printf 'x'"$h";
fi

# Case where we're inside of a substring that we want to extract
else
if [ "$d" -ge 32 ] && [ "$d" -le 127 ]; then
printf 'x'"$h";
else
inside_bad_block="true";
echo
fi
fi
done

if [ "$inside_bad_block" == "false" ]; then
echo
fi


Now we can test this out by creating an example file which has both x00 and xff subsequences separating substrings:



printf 'x00x00x00string1xffxffxffstring2x00x00x00string3x00x00x00' > data.hex


And here's the output we get when running the script:



$ sh get_hex_substrings.sh data.hex
string1
string2
string3


Solution 3: Using the tr and cut commands



You could also try using the tr and cut commands to deal with the null bytes. Here's an example of extracting the first null-terminated string from a list of null-terminated strings by squeezing/collapsing adjacent null-characters and converting them to newlines:



$ printf '000000string1000000string2000000string3000000' > file.dat
$ tr -s '00' 'n' < file.dat | cut -d$'n' -f2
string1






share|improve this answer















share|improve this answer



share|improve this answer








edited May 16 at 17:51


























answered May 16 at 14:02









igal

4,785930




4,785930











  • How would it handle the FF bytes? and what if there will be other garbage values after the null byte that ends the string?
    – CIsForCookies
    May 16 at 14:35






  • 1




    @CIsForCookies The first method (dat="$(cat eeprom)" probably won't work for that case, but the loop obviously will - you can include whatever logic you want in there. Maybe update your question to include a more complete input example along with the desired output?
    – igal
    May 16 at 15:15










  • Can you please explain the xxd parameters? can't understand why -c is needed (the others I'm not sure about)
    – CIsForCookies
    May 16 at 16:08






  • 1




    @CIsForCookies The -c parameter sets the number of columns used when outputting. Using -c 1 should mean that each byte is printed on its own line. I don't think it's necessary, strictly speaking - just a personal choice. Anyway, I'll update my post.
    – igal
    May 16 at 16:23
















  • How would it handle the FF bytes? and what if there will be other garbage values after the null byte that ends the string?
    – CIsForCookies
    May 16 at 14:35






  • 1




    @CIsForCookies The first method (dat="$(cat eeprom)" probably won't work for that case, but the loop obviously will - you can include whatever logic you want in there. Maybe update your question to include a more complete input example along with the desired output?
    – igal
    May 16 at 15:15










  • Can you please explain the xxd parameters? can't understand why -c is needed (the others I'm not sure about)
    – CIsForCookies
    May 16 at 16:08






  • 1




    @CIsForCookies The -c parameter sets the number of columns used when outputting. Using -c 1 should mean that each byte is printed on its own line. I don't think it's necessary, strictly speaking - just a personal choice. Anyway, I'll update my post.
    – igal
    May 16 at 16:23















How would it handle the FF bytes? and what if there will be other garbage values after the null byte that ends the string?
– CIsForCookies
May 16 at 14:35




How would it handle the FF bytes? and what if there will be other garbage values after the null byte that ends the string?
– CIsForCookies
May 16 at 14:35




1




1




@CIsForCookies The first method (dat="$(cat eeprom)" probably won't work for that case, but the loop obviously will - you can include whatever logic you want in there. Maybe update your question to include a more complete input example along with the desired output?
– igal
May 16 at 15:15




@CIsForCookies The first method (dat="$(cat eeprom)" probably won't work for that case, but the loop obviously will - you can include whatever logic you want in there. Maybe update your question to include a more complete input example along with the desired output?
– igal
May 16 at 15:15












Can you please explain the xxd parameters? can't understand why -c is needed (the others I'm not sure about)
– CIsForCookies
May 16 at 16:08




Can you please explain the xxd parameters? can't understand why -c is needed (the others I'm not sure about)
– CIsForCookies
May 16 at 16:08




1




1




@CIsForCookies The -c parameter sets the number of columns used when outputting. Using -c 1 should mean that each byte is printed on its own line. I don't think it's necessary, strictly speaking - just a personal choice. Anyway, I'll update my post.
– igal
May 16 at 16:23




@CIsForCookies The -c parameter sets the number of columns used when outputting. Using -c 1 should mean that each byte is printed on its own line. I don't think it's necessary, strictly speaking - just a personal choice. Anyway, I'll update my post.
– igal
May 16 at 16:23












 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f444162%2fhow-to-read-a-nul-terminated-string-from-a-binary-file%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

How to check contact read email or not when send email to Individual?

Bahrain

Postfix configuration issue with fips on centos 7; mailgun relay