Reputation
Badges 1
92 × Eureka!Hi, AgitatedDove14 Thanks for the answer.
I think the upload reporting (files over 5mb) was added post 0.17 version,
That what I thought...
I think it can be helpful to add it to the conf since 5MB is really small and my files are ~300MB, meaning 60 messages for each upload.
Another option is maybe to configure it as Task.init() parameter
I think both are OK 🙂
someone in my company started a training 😥 , will do it after it will finish.. and will update
Thanks you are the best 🙏
OHH nice, I thought that it just some kind of job queue on up and running machines
I just need it to ran the docker and run the command inside it no?
It is now stacking after:
` 2021-03-09 14:54:07
task 609a976a889748d6a6e4baf360ef93b4 pulled from 8e47f5b0694e426e814f0855186f560e by worker ov-01:gpu1
2021-03-09 14:54:08
running Task 609a976a889748d6a6e4baf360ef93b4 inside default docker image: MyDockerImage:v0
2021-03-09 14:54:08
Executing: ['docker', 'run', '-t', '--gpus', '"device=1"', '-e', 'CLEARML_WORKER_ID=ov-01:gpu1', '-e', 'CLEARML_DOCKER_IMAGE=MyDockerImage:v0', '-v', '/tmp/.clearml_agent.jvxowhq4.cfg:/root/clearml.conf', '-v', '/...
Hi SuccessfulKoala55 and AgitatedDove14 ,
Thanks for the quick replay.
I'm not sure I understand your use-case - do you always want to change the contents of the file in your code? Why not change it before connecting?
Changing the file before the connect will make sense only when I am running locally and the file exists. Remotely I must get the file with connect_configuration(path, name=name)
before I am reading it.
"local_path" is ignored, path is a temp file, and the c...
I have one computer with 4 GPUs and like to create a queue over the gpus..
For now the project runs without queue.
My configs holds the relative paths to the data (and it can take time to change all of them) so I prefer to work in relative paths if it possible..
Thanks for the quick replay.
This will set more time before the timeout right?
Maybe there is a way to do something like:task.freeze_monitor() download() task.defrost_monitor()
I am sure you add this timeout for a reason.
Probably since increasing the timeout can affect other functionality. .
Am I wrong?
SuccessfulKoala55 and AppetizingMouse58 Thanks you very much!!
I have a future question:
Does this fix should harm in future cleraml-server upgrade?
Or what the best practice to upgrade after doing it?
SuccessfulKoala55 Thanks 🙏 ..
Another related question:
My remote job fails because it cannot find the data.FileNotFoundError: [Errno 2] No such file or directory: './data/XXXXXXXX
I mounted the data to the same place relative to my project inside the docker with: extra_docker_arguments
I am using execute_remotely
for enqueue the job.
I know it works locally since the job reads from ./data/XXXX before execute_remotely()
and working.
but when the agent create ...
but I am think they done it for a reason no?
WOW.. Thanks 💯
Does it still work if I will keep trains.conf like this, and mount the S3 also?
I didn't try trains-agent yet, does it support using AWS batch?
Ohh I understood, so can you give me a short explanation on how to change the meta data?
Hi SuccessfulKoala55 ,
I down the server:
` [ec2-user@ip-172-31-26-41 ~]$ sudo docker-compose -f /opt/clearml/docker-compose.yml down
WARNING: The CLEARML_HOST_IP variable is not set. Defaulting to a blank string.
WARNING: The CLEARML_AGENT_GIT_USER variable is not set. Defaulting to a blank string.
WARNING: The CLEARML_AGENT_GIT_PASS variable is not set. Defaulting to a blank string.
Stopping clearml-webserver ... done
Stopping clearml-agent-services ... done
Stopping clearml-apiserver...
does it ok that it looks for files in /opt/trains
? since we move all to /opt/clearml
no?File "/opt/trains/apiserver/mongo/initialize/migration.py"
the index creation:[ec2-user@ip-172-31-26-41 ~]$ sudo docker exec -it clearml-mongo /bin/bash root@3fc365193ed0:/# mongo MongoDB shell version v3.6.5 connecting to: mongodb://127.0.0.1:27017 MongoDB server version: 3.6.5 Welcome to the MongoDB shell. For interactive help, type "help". For more comprehensive documentation, see
Questions? Try the support group
`
Server has startup warnings:
2021-01-25T05:58:37.309+0000 I CONTROL [initandlisten]
2021-01-25T05:58:37.309+0000 I C...
how long? 😅
I am now stuck inCopying index events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b
for more then 40 min 😥
my docker has my project on it all ready so I know where to mount. Maybe the agent moves/create copy of my project somewhere else?
Hi CumbersomeCormorant74 ,
This is a server we installed.
The server version is: 0.17
We checked with Chrome, and FireFox
Thanks, ophir