Reputation
Badges 1
39 × Eureka!Thank you for the answer.
I have 2 different cuda versions.
I need tensorflow 2.2, 2.3, 2.4, 2.5.
For tensorflow 2.2 i need cuda 10.1
But for tensorflow 2.4 i need for example cuda 11.0
https://www.tensorflow.org/install/source#gpu .
For docker I use for example: --docker nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
Then tensorflow 2.4 no longer works because tensorflow 2.4 requires cuda 11 and not cuda 10.1
Does anyone have any idea?
I can also pass 2 different docker images?
ok
but how can I get the hyperparameters from the current task?
The scripts are all in the git repo.
But still the same problem.
I use os.system.
Is there a better way to call the other python script?
thank you for the feedback
TimelyPenguin76 SuccessfulKoala55
I used the line you wrote me. But at the first time I start the program with the command line.
I have still the problem with the demo server.
At the moment it has nothing to do with the clearml-agent.
my clearml.conf:
api_server: http://192.168.40.210:8008
web_server: http://192.168.40.210:8080
files_server: http://192.168.40.210:8081
the parameter must be "imagenet". But when I print the parameter in my code it is imagenet without quotes. But tensorflow needs "imagenet"
I hope you understand what I mean
CLEARML-AGENT configuration file
api {
# Notice: 'host' is the api server (default port 8008), not the web server.
api_server: http://192.168.40.210:8008
web_server: http://192.168.40.210:8080
files_server: http://192.168.40.210:8081
# Credentials are generated using the webapp, http://192.168.40:8080/profile
# Override with os environment: CLEARML_API_ACCESS_KEY / CLEARML_API_SECRET_KEY
credentials {"access_key": "XXXXXXXXXXXXXXXXXX", "secret_key": "XXXX...
I removed the trains.conf
first line:
TRAINS Task: overwriting (reusing) task id=8ce7a396ae8c4a14b22186a48ade5d91
thank you
it works now
you really helped me
thank you for the information.
I am using the same GUI on 2 servers.
On both servers the following path did not exist:/opt/trains
So I could not stop allegro.
I run the commands on 1 server to upgrade it. But on the gui there is still the old version.
Does anyone know how I can proceed?
Now I ran docker-compose down
But the allegro server is still available.
where can I change it?
when i right click on the cloned project then there is no option to change it.
first i run it locally. This works. But then I use the clearML agent and then it does not work
clearml-agent --config-file /home/chuber/clearml.conf daemon --detached --gpus 1
--queue KA_ML2_GPU1 --docker nvidia/cuda:10.1-cudnn7-devel
assert os.path.exists("path")
with this line I get the error that the path does not exist
thanks for the answer.
I tried it but it did not work.
I have the same error:
fatal: could not read Username for ' http://rz-s-git ': terminal prompts disabled.
The git account have 2 users. I tried a run a different project from the other user and it worked.
The problem is to clone repository from different users.
does anyone know how I can best proceed?
At the moment I try SSH.
I should have permission. what can I do?
git@rz-s-git: Permission denied (publickey,password).
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
i try to run the agent without docker. Without docker mode the path is available. But i need docker for tensorflow and cuda
is there a better way instead of creating multiple ssh keys?
thanks.
i tried 1.0.4rc0 but get the same error.
Output from allegro:
2021-06-01 15:51:59.984367: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-06-01 15:52:00.019168: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3399905000 Hz
2021-06-01 15:52:00.683090: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-06...
sorry
I solved the mistake. there was a mistake in my file path and then the training could not be started
sdk {
# ClearML - default SDK configuration
storage {
cache {
# Defaults to system temp folder / cache
default_base_dir: "~/.clearml/cache"
size {
# max_used_bytes = -1
min_free_bytes = 10GB
# cleanup_margin_percent = 5%
}
}
direct_access: [
# Objects matching are considered to be available for direct access, i.e. they will not be downloaded
...