Reputation
Badges 1
25 × Eureka!I see,
@<1571308003204796416:profile|HollowPeacock58> can you please send the full log?
(The odd thing is it is trying to install the python 3.10 version of torch, when your command line suggest it is running python 3.8)
And is Task.init called on all processes ?
No. since you are using Pool. there is no need to call task init again. Just call it once before you create the Pool, then when you want to use it, just do task = Task.current_task()
I see, so thereβs no way to launch a variant of my last run (with say some config/code tweaks) via CLI, and have it re-use the cached venv?
Try:clearml-task ... --requirements requirements.txtYou can also clone / override args withclearml-task --base-task-id <ID-of-original-task-post-agent> --args ...See full doc: https://clear.ml/docs/latest/docs/apps/clearml_task/
You might be able to write a script to override the links ... wdyt?
Thanks @<1569496075083976704:profile|SweetShells3> ! let me see if I can reproduce the issue
I'm not familiar with this one, I think you should be able to control it with:
None
CLEARML_AGENT__API__HTTP__RETRIES__BACKOFF_FACTOR
HugeArcticwolf77 you can add --services-mode to the agent, and it will basically keep on spinning Tasks in parallel (unfortunately the open source version does not include a way to limit it to a maximum of concurrent Tasks)
JoyousKoala59 what is the Trains server you have? the link you posted is to upgrade from v0.15 to v0.16, not from trains to clearml
Hi GrievingTurkey78
I think the main issue is the lack of support for jsonargparse , is that correct ?
(vanilla pytorch lightning is using argpraser, which seems to work out of the box)
ReassuredTiger98 oh wow I did not realize you actually call importlib to import your libraries (any reason not to call import ?)
And yes, I think we will miss it as the package analysis is actually static text analysts of the code
DeliciousBluewhale87 you can try:
` import sqlite3
import pandas as pd
conn = sqlite3.connect('test_database')
sql_query = pd.read_sql_query ('''
SELECT
*
FROM products
''', conn)
sql_query.to_csv(...) `
ThickDove42 you need the latest cleaml-agent RC for the docker setup script (next version due next week)pip install clearml-agent==0.17.3rc0
We should probably have a section on that (i.e. running two agents on the same GPU, then explain how top use it)
Still not supported π
I just cloned it from the examples that are available in the SaaS console upon account creation
Ohhh! that would explain it. Maybe it is broken there?! let me check a second
JumpyPig73 I think fire was just added:
https://github.com/allegroai/clearml/pull/550
You can test with the latest RC:pip install clearml==1.2.0rc1Regrading not finding Hydra-core package, what do you have listed under Execution: "Installed Packages" (it should have auto detected that you are importing hydra and list it there)
or at least stick to the requirements.txt file rather than the actual environment
You can also for it to log the requirements.txt withTask.force_requirements_env_freeze(requirements_file="requirements.txt") task = Task.init(...)
Hi RobustRat47
the easiest way to reproduce the entire environment on you local machine:clearml-agent build --id <task_id> --target ~/debug-full-env/This will install an entire venv including code and applying git changes:
You can also create a container with everything:
https://clear.ml/docs/latest/docs/clearml_agent#task-container
GreasyLeopard35 from the implementation:
https://github.com/allegroai/clearml/blob/fcad50b6266f445424a1f1fb361f5a4bc5c7f6a3/clearml/automation/parameters.py#L215
Which basically returns the "self.base" (default) 10 to the power of the selected value:10**-3 = 0.001
So how would I get a negative value ?
TRAINS_WORKER_NAME=first_agent trains-agent --gpus 0
andTRAINS_WORKER_NAME=second_agent trains-agent --gpus 0
however if I want multiple machines syncing with the optimizer, for pulling the sampled hyper parameters and reporting results, I can't see how it would work
I have to admit, this is where I'm loosing you.
I thought you wanted to avoid the agent, since you wanted to run everything locally, wasn't that the issue ?
Maybe there is some background missing here, let me see if I can explain how the optimizer works.
In your actual training code you have something like:` params = {'lr': 0.3, ...
it seems like each task is setup to run on a single pod/node based on the attributes like
gpu memory
,
os
,
num of cores,
worker
BoredHedgehog47 of course you can scale on multiple node.
The way to do that is to create a k8s Yaml with replicas, each pod is actually running the exact same code with the exact same setup, notice that inside the code itself the DL frameworks need to be able to communicate with one another and b...
Hi CooperativeFox72
But my docker image has all my code and all the packages it needed I don't understand why the agent need to install all of those again?Β (edited)
So based on the docker file you previously posted, I think all your python packages are actually installed on the "appuser" and not as system packages.
Basically remove the "add user" part and the --user from the pip install.
For example:
` FROM nvidia/cuda:10.1-cudnn7-devel
ENV DEBIAN_FRONTEND noninteractive
RUN ...
and I run agent from local user and I would expect that settings to have effect -v /home/localuser/.ssh:/home/testuser/.ssh
It does not map it directly, it creates a temp copy in the host /tmp folder of the entire ".ssh" folder, than maps this folder inside the container:
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a516cab60e83/clearml_agent/commands/worker.py#L3422
Notice that the "docker_internal_mounts" section is nested inside the "agent" section ...
What is the specific use case, updating a file on existing dataset and creating a new version?