Speeding up repeated python calls (or, alternatively, porting a complex regex to sed)

I am an academic medical physicist. I do experiments that generate a fair amount of data, and they are expensive to run. My university has a backup system that consists of a robot tape library in a disused salt mine that uses IBM's Spectrum Protect (invoked as dsmc) that I use for off site backups. Although there is no limit on the total size I can send to the salt mine, there is a per-day transfer limit of 200 gigabytes. As far as I know, there is no way to get the Spectrum Protect client to respect this limit, and stop after the transfer limit is reached.

If one busts this limit, the server locks the node and I have to send a grovelling apologetic email to someone to ask them to unlock it. They tell me off for using too much bandwidth, and, something like 24-48 hours later, unlock the node.

To get around the fact that I create data in discrete chunks (on experiment days) and am well under the bandwidth limit on a per-month or per-week basis, I've written a simple wrapper script to parse the output of dsmc and kill the transfer if it gets too large.

The parsing is done via treating the output of dsmc as a here doc in bash with a simple python script:

#!/bin/bash
# A silly wrapper script to halt TSM backups
#
# Usage: sudo /path/to/script /path/to/backup/location 
# 
# Requires python3 accessible as python3, and the regex / os modules. 
# Tested on MacOS and Linux 
BYTES_SENT=0;
#MAX_SIZE_TO_SEND=150 #Bytes, for testing 
MAX_SIZE_TO_SEND=$[185*(2**30)] 

args=("$@")
sudo rm -f /tmp/dsmc-script.PID

function outputParser() 
 python3 <<'EOF'
import os, re
rex=re.compile(r"Normal File-->s*?([,0-9]*,?)s*?/")
valueToParse=os.environ.get('line');
match=rex.match(valueToParse);
try:
 stringToReturn = str(match.group(1));
 stringToReturn =stringToReturn.replace(',','');
except AttributeError:
 stringToReturn = "";
#Check for failed transfers 
failedResults = re.findall(r"** Unsuccessful **", valueToParse); 
nFailedResults = len(failedResults); 
if (nFailedResults >0):
 stringToReturn = ""; 
print(stringToReturn);
EOF
 #I am sure that the above is a one-liner in sed or awk. I just don't know what the one line is. 

function trapCaught() 
 #Do cleanup, not shown 
 echo ", quitting."


trap trapCaught sigint
killCount=0 
startTime=$SECONDS

while read -r line; do 
 echo "$line"
 export line; 
 X=$(export line=$line; outputParser)
 if [[ ! -z "$X" ]]; then
 BYTES_SENT=$[$BYTES_SENT + $X]
 echo "Sent $X bytes, $BYTES_SENT in total"
 fi
 if (( BYTES_SENT > MAX_SIZE_TO_SEND )); then
 if (( killCount < 1)); then 
 echo "STOPPED BACKUP BECAUSE $BYTES_SENT is GREATER THAN THE PERMITTED MAXIMUM OF $MAX_SIZE_TO_SEND"; 
 killStartTime=$(( SECONDS - startTime ))
 pid=$(cat /tmp/dsmc-script.PID)
 echo "PID is $pid"
 echo $pid | sudo xargs kill 
 fi 

 killCount=$[$killCount + 1]; 
 timeKillNow=$(( SECONDS - killStartTime ))
 rm -f /tmp/dsmc-script.PID

 if (( killCount > 100 || timeKillNow > 30 )); then 
 echo "Taking too long to die; retrying" 
 echo $pid | sudo xargs kill -9;
 sleep 0.1; 
 sudo kill -9 0; 
 fi

 fi
done < <( sudo dsmc incr $args[0] & echo $! > /tmp/dsmc-script.PID )

This works, and suits my purposes. However, performance is bad bordering on terrible, and I think this is because each iteration through the while loop spins out another instance of the python interpreter / script combo.

Given that I can't change the limit, or the behaviour of the binary compiled blob dsmc, I have three related questions:

(a) Is this a sensible approach for solving this problem, or is there a much easier way that I am missing, such as advanced voodoo with netstat?

(b) Given that what python actually does is essentially exactly the same through each iteration in the loop, is there a way to cache the interpreter's translation of the code and hence speed the whole thing up hugely?

(c ) If I were to replace the python script with an equivalent sed or awk construct, I suspect this whole thing would be much, much faster. Why? Is it possible to do this type of arithmetic easily, or is that another red herring to go down?

Edit: Example output from dsmc for those not familiar is below -- a file is only sent if "Normal file" appears in a string, followed by its size in bytes. So, in the below, the file spclicert.kdb is sent, but neither TSM.PWD and nor the directory CaptiveNetworkSupport:

# dsmc incr / 
< header message containing personal information> 
Incremental backup of volume '/'
ANS1898I ***** Processed 79,000 files *****
Directory--> 0 /Library/Preferences/SystemConfiguration/CaptiveNetworkSupport [Sent]
Normal File--> 5,080 /Library/Preferences/Tivoli Storage Manager/Nodes/SHUG2765-MACBOOKPRO-PHYSICS/spclicert.kdb [Sent]
Updating--> 224 /Library/Preferences/Tivoli Storage Manager/BrokenOrOld/TSM.PWD (original) [Sent]

So, the above script strips out the size in bytes of each file sent and simply adds them up.

asked Jan 11 at 9:41

Landak

26618

migrated from stackoverflow.com Jan 13 at 22:27

This question came from our site for professional and enthusiast programmers.

Surely sed or awk could speed it up to an extent, but one can't really suggest way, if we don't know how the input looks like

– Inian
Jan 11 at 9:49

@Inian sorry about that -- example output provided!

– Landak
Jan 11 at 9:57

1

This can be probably solved with some iptables and lsof magic, just by monitoring the total amount of traffic sent over the socket or by monitoring the average bandwidth of the socket and then doing some simple maths to kill your process before exceeding your limit. Make sure you post this crosspost this to ServerFault and/or to Unix&Linux

– JAAAY
Jan 11 at 11:38

Even rewriting the whole script into python would make it much faster than a mix of python and bash. bash shouldn't be mistaken for a programming language.

– Thomas Dickey
Jan 13 at 23:02

It seems your Python script can be replaced with something like gawk 'match( $0, /NormalsFile-->s+([0-9,]+)/, a) && gsub(/,/ , "" , a[1]) print a[1] ' <<< "$line". This will just extract the file size (without commas) from lines containing the string "Normal file-->".

– ozzy
Jan 13 at 23:10

|
show 2 more comments

The parsing is done via treating the output of dsmc as a here doc in bash with a simple python script:

#!/bin/bash
# A silly wrapper script to halt TSM backups
#
# Usage: sudo /path/to/script /path/to/backup/location 
# 
# Requires python3 accessible as python3, and the regex / os modules. 
# Tested on MacOS and Linux 
BYTES_SENT=0;
#MAX_SIZE_TO_SEND=150 #Bytes, for testing 
MAX_SIZE_TO_SEND=$[185*(2**30)] 

args=("$@")
sudo rm -f /tmp/dsmc-script.PID

function outputParser() 
 python3 <<'EOF'
import os, re
rex=re.compile(r"Normal File-->s*?([,0-9]*,?)s*?/")
valueToParse=os.environ.get('line');
match=rex.match(valueToParse);
try:
 stringToReturn = str(match.group(1));
 stringToReturn =stringToReturn.replace(',','');
except AttributeError:
 stringToReturn = "";
#Check for failed transfers 
failedResults = re.findall(r"** Unsuccessful **", valueToParse); 
nFailedResults = len(failedResults); 
if (nFailedResults >0):
 stringToReturn = ""; 
print(stringToReturn);
EOF
 #I am sure that the above is a one-liner in sed or awk. I just don't know what the one line is. 

function trapCaught() 
 #Do cleanup, not shown 
 echo ", quitting."


trap trapCaught sigint
killCount=0 
startTime=$SECONDS

while read -r line; do 
 echo "$line"
 export line; 
 X=$(export line=$line; outputParser)
 if [[ ! -z "$X" ]]; then
 BYTES_SENT=$[$BYTES_SENT + $X]
 echo "Sent $X bytes, $BYTES_SENT in total"
 fi
 if (( BYTES_SENT > MAX_SIZE_TO_SEND )); then
 if (( killCount < 1)); then 
 echo "STOPPED BACKUP BECAUSE $BYTES_SENT is GREATER THAN THE PERMITTED MAXIMUM OF $MAX_SIZE_TO_SEND"; 
 killStartTime=$(( SECONDS - startTime ))
 pid=$(cat /tmp/dsmc-script.PID)
 echo "PID is $pid"
 echo $pid | sudo xargs kill 
 fi 

 killCount=$[$killCount + 1]; 
 timeKillNow=$(( SECONDS - killStartTime ))
 rm -f /tmp/dsmc-script.PID

 if (( killCount > 100 || timeKillNow > 30 )); then 
 echo "Taking too long to die; retrying" 
 echo $pid | sudo xargs kill -9;
 sleep 0.1; 
 sudo kill -9 0; 
 fi

 fi
done < <( sudo dsmc incr $args[0] & echo $! > /tmp/dsmc-script.PID )

Given that I can't change the limit, or the behaviour of the binary compiled blob dsmc, I have three related questions:

(a) Is this a sensible approach for solving this problem, or is there a much easier way that I am missing, such as advanced voodoo with netstat?

# dsmc incr / 
< header message containing personal information> 
Incremental backup of volume '/'
ANS1898I ***** Processed 79,000 files *****
Directory--> 0 /Library/Preferences/SystemConfiguration/CaptiveNetworkSupport [Sent]
Normal File--> 5,080 /Library/Preferences/Tivoli Storage Manager/Nodes/SHUG2765-MACBOOKPRO-PHYSICS/spclicert.kdb [Sent]
Updating--> 224 /Library/Preferences/Tivoli Storage Manager/BrokenOrOld/TSM.PWD (original) [Sent]

So, the above script strips out the size in bytes of each file sent and simply adds them up.

asked Jan 11 at 9:41

Landak

26618

migrated from stackoverflow.com Jan 13 at 22:27

This question came from our site for professional and enthusiast programmers.

Surely sed or awk could speed it up to an extent, but one can't really suggest way, if we don't know how the input looks like

– Inian
Jan 11 at 9:49

@Inian sorry about that -- example output provided!

– Landak
Jan 11 at 9:57

1

This can be probably solved with some iptables and lsof magic, just by monitoring the total amount of traffic sent over the socket or by monitoring the average bandwidth of the socket and then doing some simple maths to kill your process before exceeding your limit. Make sure you post this crosspost this to ServerFault and/or to Unix&Linux

– JAAAY
Jan 11 at 11:38

Even rewriting the whole script into python would make it much faster than a mix of python and bash. bash shouldn't be mistaken for a programming language.

– Thomas Dickey
Jan 13 at 23:02

It seems your Python script can be replaced with something like gawk 'match( $0, /NormalsFile-->s+([0-9,]+)/, a) && gsub(/,/ , "" , a[1]) print a[1] ' <<< "$line". This will just extract the file size (without commas) from lines containing the string "Normal file-->".

– ozzy
Jan 13 at 23:10

|
show 2 more comments

The parsing is done via treating the output of dsmc as a here doc in bash with a simple python script:

#!/bin/bash
# A silly wrapper script to halt TSM backups
#
# Usage: sudo /path/to/script /path/to/backup/location 
# 
# Requires python3 accessible as python3, and the regex / os modules. 
# Tested on MacOS and Linux 
BYTES_SENT=0;
#MAX_SIZE_TO_SEND=150 #Bytes, for testing 
MAX_SIZE_TO_SEND=$[185*(2**30)] 

args=("$@")
sudo rm -f /tmp/dsmc-script.PID

function outputParser() 
 python3 <<'EOF'
import os, re
rex=re.compile(r"Normal File-->s*?([,0-9]*,?)s*?/")
valueToParse=os.environ.get('line');
match=rex.match(valueToParse);
try:
 stringToReturn = str(match.group(1));
 stringToReturn =stringToReturn.replace(',','');
except AttributeError:
 stringToReturn = "";
#Check for failed transfers 
failedResults = re.findall(r"** Unsuccessful **", valueToParse); 
nFailedResults = len(failedResults); 
if (nFailedResults >0):
 stringToReturn = ""; 
print(stringToReturn);
EOF
 #I am sure that the above is a one-liner in sed or awk. I just don't know what the one line is. 

function trapCaught() 
 #Do cleanup, not shown 
 echo ", quitting."


trap trapCaught sigint
killCount=0 
startTime=$SECONDS

while read -r line; do 
 echo "$line"
 export line; 
 X=$(export line=$line; outputParser)
 if [[ ! -z "$X" ]]; then
 BYTES_SENT=$[$BYTES_SENT + $X]
 echo "Sent $X bytes, $BYTES_SENT in total"
 fi
 if (( BYTES_SENT > MAX_SIZE_TO_SEND )); then
 if (( killCount < 1)); then 
 echo "STOPPED BACKUP BECAUSE $BYTES_SENT is GREATER THAN THE PERMITTED MAXIMUM OF $MAX_SIZE_TO_SEND"; 
 killStartTime=$(( SECONDS - startTime ))
 pid=$(cat /tmp/dsmc-script.PID)
 echo "PID is $pid"
 echo $pid | sudo xargs kill 
 fi 

 killCount=$[$killCount + 1]; 
 timeKillNow=$(( SECONDS - killStartTime ))
 rm -f /tmp/dsmc-script.PID

 if (( killCount > 100 || timeKillNow > 30 )); then 
 echo "Taking too long to die; retrying" 
 echo $pid | sudo xargs kill -9;
 sleep 0.1; 
 sudo kill -9 0; 
 fi

 fi
done < <( sudo dsmc incr $args[0] & echo $! > /tmp/dsmc-script.PID )

Given that I can't change the limit, or the behaviour of the binary compiled blob dsmc, I have three related questions:

(a) Is this a sensible approach for solving this problem, or is there a much easier way that I am missing, such as advanced voodoo with netstat?

# dsmc incr / 
< header message containing personal information> 
Incremental backup of volume '/'
ANS1898I ***** Processed 79,000 files *****
Directory--> 0 /Library/Preferences/SystemConfiguration/CaptiveNetworkSupport [Sent]
Normal File--> 5,080 /Library/Preferences/Tivoli Storage Manager/Nodes/SHUG2765-MACBOOKPRO-PHYSICS/spclicert.kdb [Sent]
Updating--> 224 /Library/Preferences/Tivoli Storage Manager/BrokenOrOld/TSM.PWD (original) [Sent]

So, the above script strips out the size in bytes of each file sent and simply adds them up.

asked Jan 11 at 9:41

Landak

26618

The parsing is done via treating the output of dsmc as a here doc in bash with a simple python script:

#!/bin/bash
# A silly wrapper script to halt TSM backups
#
# Usage: sudo /path/to/script /path/to/backup/location 
# 
# Requires python3 accessible as python3, and the regex / os modules. 
# Tested on MacOS and Linux 
BYTES_SENT=0;
#MAX_SIZE_TO_SEND=150 #Bytes, for testing 
MAX_SIZE_TO_SEND=$[185*(2**30)] 

args=("$@")
sudo rm -f /tmp/dsmc-script.PID

function outputParser() 
 python3 <<'EOF'
import os, re
rex=re.compile(r"Normal File-->s*?([,0-9]*,?)s*?/")
valueToParse=os.environ.get('line');
match=rex.match(valueToParse);
try:
 stringToReturn = str(match.group(1));
 stringToReturn =stringToReturn.replace(',','');
except AttributeError:
 stringToReturn = "";
#Check for failed transfers 
failedResults = re.findall(r"** Unsuccessful **", valueToParse); 
nFailedResults = len(failedResults); 
if (nFailedResults >0):
 stringToReturn = ""; 
print(stringToReturn);
EOF
 #I am sure that the above is a one-liner in sed or awk. I just don't know what the one line is. 

function trapCaught() 
 #Do cleanup, not shown 
 echo ", quitting."


trap trapCaught sigint
killCount=0 
startTime=$SECONDS

while read -r line; do 
 echo "$line"
 export line; 
 X=$(export line=$line; outputParser)
 if [[ ! -z "$X" ]]; then
 BYTES_SENT=$[$BYTES_SENT + $X]
 echo "Sent $X bytes, $BYTES_SENT in total"
 fi
 if (( BYTES_SENT > MAX_SIZE_TO_SEND )); then
 if (( killCount < 1)); then 
 echo "STOPPED BACKUP BECAUSE $BYTES_SENT is GREATER THAN THE PERMITTED MAXIMUM OF $MAX_SIZE_TO_SEND"; 
 killStartTime=$(( SECONDS - startTime ))
 pid=$(cat /tmp/dsmc-script.PID)
 echo "PID is $pid"
 echo $pid | sudo xargs kill 
 fi 

 killCount=$[$killCount + 1]; 
 timeKillNow=$(( SECONDS - killStartTime ))
 rm -f /tmp/dsmc-script.PID

 if (( killCount > 100 || timeKillNow > 30 )); then 
 echo "Taking too long to die; retrying" 
 echo $pid | sudo xargs kill -9;
 sleep 0.1; 
 sudo kill -9 0; 
 fi

 fi
done < <( sudo dsmc incr $args[0] & echo $! > /tmp/dsmc-script.PID )

Given that I can't change the limit, or the behaviour of the binary compiled blob dsmc, I have three related questions:

(a) Is this a sensible approach for solving this problem, or is there a much easier way that I am missing, such as advanced voodoo with netstat?

# dsmc incr / 
< header message containing personal information> 
Incremental backup of volume '/'
ANS1898I ***** Processed 79,000 files *****
Directory--> 0 /Library/Preferences/SystemConfiguration/CaptiveNetworkSupport [Sent]
Normal File--> 5,080 /Library/Preferences/Tivoli Storage Manager/Nodes/SHUG2765-MACBOOKPRO-PHYSICS/spclicert.kdb [Sent]
Updating--> 224 /Library/Preferences/Tivoli Storage Manager/BrokenOrOld/TSM.PWD (original) [Sent]

So, the above script strips out the size in bytes of each file sent and simply adds them up.

python bash performance backup

asked Jan 11 at 9:41

Landak

26618

asked Jan 11 at 9:41

Landak

26618

asked Jan 11 at 9:41

Landak

26618

asked Jan 11 at 9:41

Landak

26618

asked Jan 11 at 9:41

Landak

26618

migrated from stackoverflow.com Jan 13 at 22:27

This question came from our site for professional and enthusiast programmers.

migrated from stackoverflow.com Jan 13 at 22:27

This question came from our site for professional and enthusiast programmers.

Surely sed or awk could speed it up to an extent, but one can't really suggest way, if we don't know how the input looks like

– Inian
Jan 11 at 9:49

@Inian sorry about that -- example output provided!

– Landak
Jan 11 at 9:57

1

This can be probably solved with some iptables and lsof magic, just by monitoring the total amount of traffic sent over the socket or by monitoring the average bandwidth of the socket and then doing some simple maths to kill your process before exceeding your limit. Make sure you post this crosspost this to ServerFault and/or to Unix&Linux

– JAAAY
Jan 11 at 11:38

Even rewriting the whole script into python would make it much faster than a mix of python and bash. bash shouldn't be mistaken for a programming language.

– Thomas Dickey
Jan 13 at 23:02

It seems your Python script can be replaced with something like gawk 'match( $0, /NormalsFile-->s+([0-9,]+)/, a) && gsub(/,/ , "" , a[1]) print a[1] ' <<< "$line". This will just extract the file size (without commas) from lines containing the string "Normal file-->".

– ozzy
Jan 13 at 23:10

|
show 2 more comments

Surely sed or awk could speed it up to an extent, but one can't really suggest way, if we don't know how the input looks like

– Inian
Jan 11 at 9:49

@Inian sorry about that -- example output provided!

– Landak
Jan 11 at 9:57

1

This can be probably solved with some iptables and lsof magic, just by monitoring the total amount of traffic sent over the socket or by monitoring the average bandwidth of the socket and then doing some simple maths to kill your process before exceeding your limit. Make sure you post this crosspost this to ServerFault and/or to Unix&Linux

– JAAAY
Jan 11 at 11:38

Even rewriting the whole script into python would make it much faster than a mix of python and bash. bash shouldn't be mistaken for a programming language.

– Thomas Dickey
Jan 13 at 23:02

It seems your Python script can be replaced with something like gawk 'match( $0, /NormalsFile-->s+([0-9,]+)/, a) && gsub(/,/ , "" , a[1]) print a[1] ' <<< "$line". This will just extract the file size (without commas) from lines containing the string "Normal file-->".

– ozzy
Jan 13 at 23:10

Surely sed or awk could speed it up to an extent, but one can't really suggest way, if we don't know how the input looks like

– Inian
Jan 11 at 9:49

@Inian sorry about that -- example output provided!

– Landak
Jan 11 at 9:57

This can be probably solved with some iptables and lsof magic, just by monitoring the total amount of traffic sent over the socket or by monitoring the average bandwidth of the socket and then doing some simple maths to kill your process before exceeding your limit. Make sure you post this crosspost this to ServerFault and/or to Unix&Linux

– JAAAY
Jan 11 at 11:38

Even rewriting the whole script into python would make it much faster than a mix of python and bash. bash shouldn't be mistaken for a programming language.

– Thomas Dickey
Jan 13 at 23:02

It seems your Python script can be replaced with something like gawk 'match( $0, /NormalsFile-->s+([0-9,]+)/, a) && gsub(/,/ , "" , a[1]) print a[1] ' <<< "$line". This will just extract the file size (without commas) from lines containing the string "Normal file-->".

– ozzy
Jan 13 at 23:10

|
show 2 more comments

2 Answers
2

active

oldest

votes

Assuming the connection is reliable, a simple kludge would be to use a user-space traffic shaper. Just set it up to use no more than the maximum bandwidth per day.

An example using trickle, a big file foo, and scp:

l=$(( (200*10**6)/(24*60**2) ))
trickle -d $l scp foo username@remotehost:~/

And trickle would slow down the transfer to 2314K per second, which would top out at no more than 199,929,600,000 bytes per day. The file transfer program needn't be scp, it could be anything, (even a web-browser), (or dsmc), just so it's started from the command line.

An advantage of this method is that it wouldn't be necessary to break up the file foo if it was bigger than the daily limit. Of course it would take a while to send foo over, (if foo were 1TB, it would take 5 days), but it would take that long anyway.

trickle has a daemon version called trickled, which controls every subsequent run of `trickle. Example:

l=$(( (200*10**6)/(24*60**2) ))
trickled -d $l
trickle scp foo username@remotehost:~/ &
trickle scp bar username@remotehost:~/ &
trickle scp baz username@remotehost:~/ &

Supposing that each of the files foo, bar, and baz were 1TB in size, trickled would still keep the transfer within the 200GB/day limit.

edited Jan 14 at 2:47

answered Jan 14 at 2:37

agc

4,64611137

Great suggestion, to me seems far more sensible than killing the process arbitrarily

– tink
Jan 14 at 2:44

add a comment |

Your input can be parsed entirely in bash. Here's a sample:

max=$[185*(2**30)]

export total=0
while read first second third rest; do
 [[ "$first" == "Normal" && "$second" == "File-->" ]] && 
 size=$third//,/
 echo "file: $size"
 total=$(( total + size ))
 (( total > max )) && kill something
 
done < ~/tmp/your-input

If you're truly limited by the time taken to spawn a sub-process, this avoids the overhead even of calling out to awk or sed.

answered Jan 14 at 2:37

wef

30414

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f494317%2fspeeding-up-repeated-python-calls-or-alternatively-porting-a-complex-regex-to%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Assuming the connection is reliable, a simple kludge would be to use a user-space traffic shaper. Just set it up to use no more than the maximum bandwidth per day.

An example using trickle, a big file foo, and scp:

l=$(( (200*10**6)/(24*60**2) ))
trickle -d $l scp foo username@remotehost:~/

trickle has a daemon version called trickled, which controls every subsequent run of `trickle. Example:

l=$(( (200*10**6)/(24*60**2) ))
trickled -d $l
trickle scp foo username@remotehost:~/ &
trickle scp bar username@remotehost:~/ &
trickle scp baz username@remotehost:~/ &

Supposing that each of the files foo, bar, and baz were 1TB in size, trickled would still keep the transfer within the 200GB/day limit.

edited Jan 14 at 2:47

answered Jan 14 at 2:37

agc

4,64611137

Great suggestion, to me seems far more sensible than killing the process arbitrarily

– tink
Jan 14 at 2:44

add a comment |

Assuming the connection is reliable, a simple kludge would be to use a user-space traffic shaper. Just set it up to use no more than the maximum bandwidth per day.

An example using trickle, a big file foo, and scp:

l=$(( (200*10**6)/(24*60**2) ))
trickle -d $l scp foo username@remotehost:~/

trickle has a daemon version called trickled, which controls every subsequent run of `trickle. Example:

l=$(( (200*10**6)/(24*60**2) ))
trickled -d $l
trickle scp foo username@remotehost:~/ &
trickle scp bar username@remotehost:~/ &
trickle scp baz username@remotehost:~/ &

Supposing that each of the files foo, bar, and baz were 1TB in size, trickled would still keep the transfer within the 200GB/day limit.

edited Jan 14 at 2:47

answered Jan 14 at 2:37

agc

4,64611137

Great suggestion, to me seems far more sensible than killing the process arbitrarily

– tink
Jan 14 at 2:44

add a comment |

Assuming the connection is reliable, a simple kludge would be to use a user-space traffic shaper. Just set it up to use no more than the maximum bandwidth per day.

An example using trickle, a big file foo, and scp:

l=$(( (200*10**6)/(24*60**2) ))
trickle -d $l scp foo username@remotehost:~/

trickle has a daemon version called trickled, which controls every subsequent run of `trickle. Example:

l=$(( (200*10**6)/(24*60**2) ))
trickled -d $l
trickle scp foo username@remotehost:~/ &
trickle scp bar username@remotehost:~/ &
trickle scp baz username@remotehost:~/ &

Supposing that each of the files foo, bar, and baz were 1TB in size, trickled would still keep the transfer within the 200GB/day limit.

edited Jan 14 at 2:47

answered Jan 14 at 2:37

agc

4,64611137

Assuming the connection is reliable, a simple kludge would be to use a user-space traffic shaper. Just set it up to use no more than the maximum bandwidth per day.

An example using trickle, a big file foo, and scp:

l=$(( (200*10**6)/(24*60**2) ))
trickle -d $l scp foo username@remotehost:~/

trickle has a daemon version called trickled, which controls every subsequent run of `trickle. Example:

l=$(( (200*10**6)/(24*60**2) ))
trickled -d $l
trickle scp foo username@remotehost:~/ &
trickle scp bar username@remotehost:~/ &
trickle scp baz username@remotehost:~/ &

Supposing that each of the files foo, bar, and baz were 1TB in size, trickled would still keep the transfer within the 200GB/day limit.

edited Jan 14 at 2:47

answered Jan 14 at 2:37

agc

4,64611137

edited Jan 14 at 2:47

answered Jan 14 at 2:37

agc

4,64611137

answered Jan 14 at 2:37

agc

4,64611137

answered Jan 14 at 2:37

agc

4,64611137

Great suggestion, to me seems far more sensible than killing the process arbitrarily

– tink
Jan 14 at 2:44

add a comment |

Great suggestion, to me seems far more sensible than killing the process arbitrarily

– tink
Jan 14 at 2:44

Great suggestion, to me seems far more sensible than killing the process arbitrarily

– tink
Jan 14 at 2:44

add a comment |

Your input can be parsed entirely in bash. Here's a sample:

max=$[185*(2**30)]

export total=0
while read first second third rest; do
 [[ "$first" == "Normal" && "$second" == "File-->" ]] && 
 size=$third//,/
 echo "file: $size"
 total=$(( total + size ))
 (( total > max )) && kill something
 
done < ~/tmp/your-input

If you're truly limited by the time taken to spawn a sub-process, this avoids the overhead even of calling out to awk or sed.

answered Jan 14 at 2:37

wef

30414

add a comment |

Your input can be parsed entirely in bash. Here's a sample:

max=$[185*(2**30)]

export total=0
while read first second third rest; do
 [[ "$first" == "Normal" && "$second" == "File-->" ]] && 
 size=$third//,/
 echo "file: $size"
 total=$(( total + size ))
 (( total > max )) && kill something
 
done < ~/tmp/your-input

If you're truly limited by the time taken to spawn a sub-process, this avoids the overhead even of calling out to awk or sed.

answered Jan 14 at 2:37

wef

30414

add a comment |

Your input can be parsed entirely in bash. Here's a sample:

max=$[185*(2**30)]

export total=0
while read first second third rest; do
 [[ "$first" == "Normal" && "$second" == "File-->" ]] && 
 size=$third//,/
 echo "file: $size"
 total=$(( total + size ))
 (( total > max )) && kill something
 
done < ~/tmp/your-input

If you're truly limited by the time taken to spawn a sub-process, this avoids the overhead even of calling out to awk or sed.

answered Jan 14 at 2:37

wef

30414

Your input can be parsed entirely in bash. Here's a sample:

max=$[185*(2**30)]

export total=0
while read first second third rest; do
 [[ "$first" == "Normal" && "$second" == "File-->" ]] && 
 size=$third//,/
 echo "file: $size"
 total=$(( total + size ))
 (( total > max )) && kill something
 
done < ~/tmp/your-input

If you're truly limited by the time taken to spawn a sub-process, this avoids the overhead even of calling out to awk or sed.

answered Jan 14 at 2:37

wef

30414

answered Jan 14 at 2:37

wef

30414

answered Jan 14 at 2:37

wef

30414

answered Jan 14 at 2:37

wef

30414

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

搜尋此網誌

mjhjmtu