JitteryCoyote63

From my experience, I only installed cuda drivers on my machines. I didn't used conda to install torch nor cudatoolkit, I just let clearml-agent download the torch wheel file and install it

3 years ago

0 Hi, In One Of My Agents With Cuda Version: 11.1 (From Nvidia-Smi), Clearml Agent 0.17.1 Detects Version 100 (I Can See From Experiments Logs:

and with this setup I can use GPU without any problem, meaning that the wheel does contain the cuda runtime

3 years ago

0 Hi, I Just Updated Clearml-Server To 1.1.0 And Got The Following Error When Starting It With Docker-Compose:

Not of the ES cluster, I only created a backup of the clearml-server instance disk, I didn’t think there could be a problem with ES…

3 years ago

0 Hello, I Tried The Clearml-Session Cli To Start A Jupyter Instance On An Agent, But An Error With The Password, Here Is The Full Cli Log:

Yes, it works now! Yay!

3 years ago

0 Hi Guys, With The New Venv Caching Available In Clearml, I Have The Following Problem: I Force My Pip Requirements To Be:

AgitatedDove14 The first time it installs and create the cache for the env, the second time it fails with:
Applying uncommitted changes ERROR: Directory '.' is not installable. Neither 'setup.py' nor 'pyproject.toml' found. clearml_agent: ERROR: Command '['/home/user/.clearml/venvs-builds.1/3.6/bin/python', '-m', 'pip', '--disable-pip-version-check', 'install', '-r', '/tmp/cached-reqsmncaxx45.txt']' returned non-zero exit status 1.

3 years ago

0 Hey There, I Would Like To Increase The

it actually looks like I don’t need such a high number of files opened at the same time

3 years ago

0 Hey There, I Would Like To Increase The

by replacing the pid with $PID ?

3 years ago

0 Hello, I Have An Error While Installing Git Dependencies Of Local Package: So Far I Used Task.

yes, the only thing I changed is:
install_requires=[ ... "my-dep @ git+ ]to:
install_requires=[ ... "git+ "]

3 years ago

0 Hello There, I Would Like To Do Run Cleanup Code In Case The User Aborts One Task From The Dashboard (The Agent Is Not Using The Task In Docker). What Signal Should I Listen For In The Task?

I am looking for a way to gracefully stop the task (clean up artifacts, shutdown backend service) on the agent

4 years ago

0 Hi, I Would Like To Create Backups Of My Trains-Server Periodically. I Was Thinking About Creating A Service Task Under The Devops Project. The Backup Task Would:

(Just to know if I should wait a bit or go with the first solution)

3 years ago

0 Hi, I Would Like To Create Backups Of My Trains-Server Periodically. I Was Thinking About Creating A Service Task Under The Devops Project. The Backup Task Would:

Are you planning to add a server-backup service task in the near future?

3 years ago

0 Hi, I Would Like To Create Backups Of My Trains-Server Periodically. I Was Thinking About Creating A Service Task Under The Devops Project. The Backup Task Would:

ok, thanks SuccessfulKoala55 !

3 years ago

0 Hi There, Is It Possible To Configure The Clearml-Agent To Run Some Commands Before Running Each Experiment It Launches? Eg.

Hi CostlyOstrich36 ! no I am running on venv mode

3 years ago

0 Hi, If I Am Starting My Training With The Following Command:

Hi AgitatedDove14 , I investigated further and got rid of a separate bug. I was able to get ignite’s events fired, but still no scalars logged 😞
There is definitely something wrong going on with the reporting of scalars using multi processes, because if my ignite callback is the following:

` def log_loss(engine):
idist.barrier(). # Sync all processes
device = idist.device()
print("IDIST", device)
from clearml import Task
Task.current_task().get_logger().r...

3 years ago

0 Hi, If I Am Starting My Training With The Following Command:

For the moment this is what I would be inclined to believe

3 years ago

0 Hi, Although

SuccessfulKoala55 I can try to make one, let’s see 🙂

3 years ago

0 Hi, I Have An Agent That Is Running Two Experiments At The Same Time: One That Was Running For A Long Time (11H) And One That The Agent Picked Up Afterwards, While The First One Was Still Running. Context: I Have 3 Agents Up (Not In Docker Mode) And All O

Is it because I did not specify --gpu 0 that the agent, by default pulls one experiment per available GPU?

4 years ago

0 Hi Guys, With The New Venv Caching Available In Clearml, I Have The Following Problem: I Force My Pip Requirements To Be:

yes, in the code, i do:
task._wait_for_repo_detection() REQS_TASK = ["torch==1.3.1", "pytorch-ignite @ git+ ", "."] task._update_requirements(REQS_TASK) task.execute_remotely(queue_name=args.queue, clone=False, exit_process=True)

3 years ago

0 Hi, In One Of My Agents With Cuda Version: 11.1 (From Nvidia-Smi), Clearml Agent 0.17.1 Detects Version 100 (I Can See From Experiments Logs:

I am still confused though - from the get started page of pytorch website, when choosing "conda", the generated installation command includes cudatoolkit, while when choosing "pip" it only uses a wheel file.
Does that mean the wheel file contains cudatoolkit (cuda runtime)?

3 years ago

0 Hi, In One Of My Agents With Cuda Version: 11.1 (From Nvidia-Smi), Clearml Agent 0.17.1 Detects Version 100 (I Can See From Experiments Logs:

now I can do nvcc --version and I get
Cuda compilation tools, release 10.1, V10.1.243

3 years ago

0 Hi, I Face A Strange Behavior From The Clearml-Agent: It’S Running In Services Mode, Not In Docker Mode, Cpu Only. I Want To Execute Two Tasks On This Service Agent. One Works, The Other Always Fails After Being Enqueued And Picked By The Agent With The E

in clearml.conf:
agent.package_manager.system_site_packages = true agent.package_manager.pip_version = "==20.2.3"

3 years ago

0 Hi Again, It Seems Like The Aws Autoscaler Is Not Spinning Instances With The Ebs Configuration I Configured. Here Is The Configuration:

Interestingly, I do see the 100gb volume in the aws console:

3 years ago

0 Hi, Is It Possible To Pass Environment Variables To Agents Created By The Aws Autoscaler Service?

Thanks for your answer! I am in the process of adding subnet_id/security_groups_id/key_name to the config to be able to ssh in the machine, will keep you informed 😄

3 years ago

0 Hi, Is It Possible To Disable Some Of The System Metrics Monitored? And Also Downsample The Rate Of Logging?

Ha nice, makes perfect sense thanks AgitatedDove14 !

3 years ago

0 Hi, Coming Back With The Venv Caching: With The Following Setting:

Yes, not sure it is connected either actually - To make it work, I had to disable both venv caching and set use_system_packages to off, so that it reinstalls the full env. I remember that we discussed this problem already but I don't remember what was the outcome, I never was able to make it update the private dependencies based on the version. But this is most likely a problem from pip that is not clever enough to parse the tag as a semantic version and check whether the installed package ma...

3 years ago

0 Hi, I Am Getting The Following Errors In The Experiments I Am Currently Running:

can it be that the merge op takes so much filesystem cache that the rest of the system becomes unresponsive?

3 years ago

Show more results