Correct:extra_docker_shell_script: ["apt-get install -y awscli", "aws codeartifact login --tool pip --repository my-repo --domain my-domain --domain-owner 111122223333"]
I think poetry should somehow return error if toml is "empty" then we can detect it...
Hmm I tested on chromium and it seemed to work, let me see if I can reproduce it...
I solved the issue by implementing my own ClearML logger
This is awesome! any chance you want to PR it to transformers ?
I think that by default the zipped package files are 0.5GB
(you can control it None look for --chunk-size)
I think the missing part of the api is understanding which chunk your specific file stored in.
You can do something like:
ds = Dataset.get(...)
the_artifact_chunk_I_need = ds.file_entries_dict["myt/file/here"].artifact_name
wdyt?
maybe worth to add an interface ?
Hmm @<1523701279472226304:profile|SoreHorse95> this is a good point, I think you are correct we need to fix that,
- Could you open a GitHub issue so this is not forgotten ?
- As a workaround I would use clone=True, then after the call I would call task.close() on the original task, wdyt?
The thing I don't understand is how come this DOES work on our linux setups
I do not think it actually works... I could not have find a code that will convert the ENV in the config string ...
I'll be happy to test it out if there's any commit available?
Please do, and feel free to PR it 😍
https://github.com/allegroai/clearml/blob/d3e986393ac8d1a1ea48302224962570ab8e6f9e/clearml/backend_api/session/session.py#L576
https://github.com/allegroai/clearml/blob/d3e98639...
@<1577468638728818688:profile|DelightfulArcticwolf22>
How can I tell clearml-agent not to run pip install unless my requierments.txt file was changed.
the agent has built in cache, it will reuse the previous venv if nothing changed (cache local on the agent's machine).
Make sure this is line is not commented :
None
for example, one notebook will be dedicated to explore columns, spot outliers and create transformations for specific column values.
This actually implies each notebook is a standalone "process", which makes a ton of sense. But this is where notebooks and proper SW design break, in traditional SW, the notebooks are actually python files, and then of course you can import one from another, unfortunately this does not work in notebooks...
If you are really keen on using notebooks I wou...
task=Task.current_task()
Will get me the task object. (right?)
PanickyMoth78 yes, always, from anywhere, this is a singleton object 🙂
${PWD} works!
This will be resolved every call to Task.init (so I would recommend against it), how about "$HOME/" ?
Thank you JuicyOtter4 ! 😍
. Is there a way to programmatically set that in the code?
Something like?
` task = Task.init(...)
probably we should change that to description ?!
task.set_comment("best thing ever") `
well I do not think you set your pytorch lightining to use cuda:
GPU available: True (cuda), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/code/.venv/lib/python3.9/site-packages/lightning/pytorch/trainer/setup.py:176: PossibleUserWarning: GPU available but not used. Set `accelerator` and `devices` using `Trainer(accelerator='gpu', devices=1)`.
WittyOwl57 could it be the EC2 instance is too small (i.e. not enough storage / memory) ?
clearml-agent daemon --detached --queue manual_jobs automated_jobs --docker --gpus 0
If the user running this command can run "docker run", then you should ne fine
Hi @<1523701304709353472:profile|OddShrimp85>
the venv setup is totally based on my requirements.txt instead of adding on to what the image has before. Why?
Are you using the agent in docker mode ? if this is the case it creates a venv inside the docker, inheriting from the preinstalled docker system packages,
Hi SkinnyPanda43
Can you attache the full log?
Clearml agent is installed before your requirements.txt , at least in theory it should not collide
Hi FlatOctopus65
You are almost thereprev_task: Task = Task.get_task(task_id=<prev_task_id_here>) model = prev_task.models['output'][-1] my_check_point = model.get_local_copy()
None
See: Add an experiment hyperparameter:
and add gpu : True
strange ...
yes thanks , but if I do this, the packages will be installed for each step again, is it possible to use a single venv?
Notice that the venv is Cached on the clearml-agent host machine (if this is k8s glue, make sure to setup the Cache as a PV to achieve the same)
This means there is no need to worry about that and this is stable.
That said, if you have an existing VENV inside the container, just add docker_args="-e CLEARML_AGENT_SKIP_PIP_VENV_INSTALL =/path/to/bin/python"
Se...
SarcasticSquirrel56
if I configure manually the pods for the different nodes, how do I make clearml server aware that those agents exist?
Basically the agent register themselves on your cleaml-server, and they register on which Queue(s) they listen to. In other words the interface to choose the different types of machines/gpus is by enqueue the Task to different queues.
For example: Queue(1): "CUDA11_GPUx1" , Queue(2): "CUDA10_GPUx1"
Make sense ?
EDIT:
I guess to achieve what I w...
Hi MysteriousBee56 ,
Yes this is permissions issue, the docker creates all folders as root (as it is the root user running inside the docker), Then when you execute in venv mode, you are running it from your user, which obviously cannot change root created folders.
store_code_diff_from_remote
don't seem to change anything in regards of this issue
Correct, it is always from remote
i'll be using the update_task, that worked just fine, thanks
(edite
Sure thing.
ShakyJellyfish91 , I took a quick look at the diff between the versions can you hack a non working version (preferably the latest) and verify the issue for me?
PompousParrot44 I assume the folder structure is something like:
repo_root:
--> test
-----> scripts
If this is the case, make sure the ""working directory" is . which means repository root
So this is verry odd, it looks like a pip bug:
The agent is trying to install torch==2.1.0.* because by default it ignores the 4th+ parts (they are unstable and torch have tendency to remove them) . and for some reason pip will not match 2.1.0.* with for example "2.1.0.dev20230306+cu118"
but based on the docs it should work:
see here: None
As a workaround you can always edit and change to the final url for example: so ...