
Reputation
Badges 1
39 × Eureka!first i run it locally. This works. But then I use the clearML agent and then it does not work
thank you for the feedback
TimelyPenguin76 SuccessfulKoala55
I used the line you wrote me. But at the first time I start the program with the command line.
I have still the problem with the demo server.
At the moment it has nothing to do with the clearml-agent.
my clearml.conf:
api_server: http://192.168.40.210:8008
web_server: http://192.168.40.210:8080
files_server: http://192.168.40.210:8081
CLEARML-AGENT configuration file
api {
# Notice: 'host' is the api server (default port 8008), not the web server.
api_server: http://192.168.40.210:8008
web_server: http://192.168.40.210:8080
files_server: http://192.168.40.210:8081
# Credentials are generated using the webapp, http://192.168.40:8080/profile
# Override with os environment: CLEARML_API_ACCESS_KEY / CLEARML_API_SECRET_KEY
credentials {"access_key": "XXXXXXXXXXXXXXXXXX", "secret_key": "XXXX...
I need to access to a tfrecord file
but with allegro i do not have access to the folder
thanks for the info
I have time now
how can I use "volume mount" with allegro?
i think there is a problem with docker. because i am in the docker container and I have to use volume mount to get access to paths outside the container.
But i dont know how to use volume mount exactly
The scripts are all in the git repo.
But still the same problem.
I use os.system.
Is there a better way to call the other python script?
thanks for the answer.
I tried it but it did not work.
I have the same error:
fatal: could not read Username for ' http://rz-s-git ': terminal prompts disabled.
The git account have 2 users. I tried a run a different project from the other user and it worked.
The problem is to clone repository from different users.
does anyone know how I can best proceed?
At the moment I try SSH.
I should have permission. what can I do?
git@rz-s-git: Permission denied (publickey,password).
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
is there a better way instead of creating multiple ssh keys?
sdk {
# ClearML - default SDK configuration
storage {
cache {
# Defaults to system temp folder / cache
default_base_dir: "~/.clearml/cache"
size {
# max_used_bytes = -1
min_free_bytes = 10GB
# cleanup_margin_percent = 5%
}
}
direct_access: [
# Objects matching are considered to be available for direct access, i.e. they will not be downloaded
...
thank you
it works now
you really helped me
thank you for the information.
I am using the same GUI on 2 servers.
On both servers the following path did not exist:/opt/trains
So I could not stop allegro.
I run the commands on 1 server to upgrade it. But on the gui there is still the old version.
Does anyone know how I can proceed?
clearml-agent --config-file /home/chuber/clearml.conf daemon --detached --gpus 1
--queue KA_ML2_GPU1 --docker nvidia/cuda:10.1-cudnn7-devel
assert os.path.exists("path")
with this line I get the error that the path does not exist
thanks.
i tried 1.0.4rc0 but get the same error.
Output from allegro:
2021-06-01 15:51:59.984367: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-06-01 15:52:00.019168: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3399905000 Hz
2021-06-01 15:52:00.683090: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-06...
Thank you for the answer.
I have 2 different cuda versions.
I need tensorflow 2.2, 2.3, 2.4, 2.5.
For tensorflow 2.2 i need cuda 10.1
But for tensorflow 2.4 i need for example cuda 11.0
https://www.tensorflow.org/install/source#gpu .
For docker I use for example: --docker nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
Then tensorflow 2.4 no longer works because tensorflow 2.4 requires cuda 11 and not cuda 10.1
Does anyone have any idea?
I can also pass 2 different docker images?
does anyone know how this works with git ssh credentials?
where can I change it?
when i right click on the cloned project then there is no option to change it.
I thought that I had to open a new thread because the question was already longer ago
i try to run the agent without docker. Without docker mode the path is available. But i need docker for tensorflow and cuda
sorry
I solved the mistake. there was a mistake in my file path and then the training could not be started
the parameter must be "imagenet". But when I print the parameter in my code it is imagenet without quotes. But tensorflow needs "imagenet"
I hope you understand what I mean