Hi YummyMoth34 they will keep on trying to send reports.
I think they try for at least several hours.
Okay that means it is running in virtual environment mode.
On the original Task (the one you enqueued) what were the installed packages (specifically the torch/torchvision) ?
Hi MortifiedCrow63
I finally got GS credentials, there is something weird going on. I can verify the issue, with model upload I get timeout error while upload_artifacts just works.
Just updating here that we are looking into it.
Hmm that is odd, it seemed to missed the fact this is a jupyter notbook.
What's the clearml version you are using ?
Hmm let me check I think you are correct here
If this doesn't help.
Go to your ~/clearml.conf
file, at the bottom of the file you can add agent.python_binary
and change it to to the location of python3.6 (you can run which python3.6
to get the full path):agent.python_binary: /full/path/to/python3.6
Hmm can you run the agent in debug mode, and check the specific console log?
'''
clearml-agent --debug daemon --foreground ...
Hi VivaciousWalrus21 I tested the sample code, and the gap was evident in Tensorboard as well. This is not clearml generating this jump this is internal (like the auto de/serialization and continue of the code base)
Hi DashingHedgehong5
Is the text the ,labels on the histogram bucket ?
Notice the xlabels
arguments, id this what you are looking for ?
Hi CourageousWhale20
Most documentation is here https://allegro.ai/docs
Question - why is this the expected behavior?
It is 🙂 I mean the original python version is stored, but pip does not support replacing python version. It is doable with conda, but than you have to use conda for everything...
Hi DepressedChimpanzee34
Why do you need to have the configuration added manually ? isn't the cleaml.conf easier ? If not I think OS environments are easier no? I run run above code, everything worked with no exception/warning... What is the try/except solves exactly ?
Hi @<1651395720067944448:profile|GiddyHedgehong81>
However I need for a yolov8 (Object detection with arround 20k jpgs and .txt files) the data.yaml file:
Just add the entire folder with your files to a dataset, then get it in your code
Add files (you can do that from CLI for example): None
clearml-data add --files my_folder_with_files
Then from code: [Non...
Do you have any experience and things to watch out for?
Yes, for testing start with cheap node instances 🙂
If I remember correctly everything is preconfigured to support GPU instances (aka nvidia runtime).
You can take one of the templates from here as a starting point:
https://aws.amazon.com/blogs/compute/running-gpu-accelerated-kubernetes-workloads-on-p3-and-p2-ec2-instances-with-amazon-eks/
Yep, everything (both conda and pip)
GleamingGrasshopper63 what do you have configured in the "package manager" section?
https://github.com/allegroai/clearml-agent/blob/5446aed9cf6217f876d3b62226e38f21d88374f7/docs/clearml.conf#L64
Quite hard for me to try this right
👍
How do I reproduce it ?
The issue itself is changing the default user.
USER appuser
WORKDIR /home/appuser
Any reason for it ?
VivaciousWalrus21 I took a look at your example from the github issue:
https://github.com/allegroai/clearml/issues/762#issuecomment-1237353476
It seems to do exactly what you expect. and stores its own last iteration as part of the checkpoint. When running the example with continue_last_task=int(0)
you get exactly what you expect
(Do notice that TB visualizes these graphs in a very odd way, and it took me a few clicks to verify it...)
WickedGoat98 Nice!!!
BTW: The fix should solve both (i.e. no need to manually cast), I'll make sure the fix is on GitHub so you'll be able to verify 🙂
FiercePenguin76 in the Tasks execution tab, under "script path", change to "-m filprofiler run catboost_train.py".
It should work (assuming the "catboost_train.py" is in the working directory).
Hi TartSeal39
So the thing is, the agent does not support yaml env for conda. Currently if the requirements section is empty, the agent will use the requirements.txt of the repo. We first need to add support for conda yaml, and then allow you to disable the auto requirements or push the specific yaml. Would that work? Also is there a reason the auto package is not working?
TartSeal39 please let me know if it works, conda is a strange beast and we do our best to tame it.
Specifically when you execute manually on a conda env we collect (separately) the conda packages & the python packages (so later we can replicate on both conda & pip, or at least do our best)
Are you running both development env and agent with conda ?
Is the clearml-agent queue not available in the open source?
fully available in the open source, what is missing is the SLURM connection, in the open source daemon is installed per machine (node) and spins containers/venv on the machine. The enterprise version adds support so it uses SLURM to provision the node. I hope it helps 🙂
so do you think it would be possible to spin up another daemon, which listens to this daemon, which then runs a slurm job?
This is exactly what the ...
Now in case I needed to do it, can I add new parameters to cloned experiment or will these get deleted?
Adding new parameters is supported 🙂