
Reputation
Badges 1
25 × Eureka!GiddyTurkey39 what do you have in the Task itself
(i.e. git repo uncommitted changes installed packages)
I mean to reduce the API calls without reducing the scalars that are logged, e.g. by sending less frequent batched updates.
Understood,
In my current trials I am using up the API calls very quickly though.
Why would that happen?
The logging is already batched (meaning 1API for a bunch of stuff)
Could it be lots of console lines?
BTW you can set the flush period to 30 sec, which would automatically collectt and batch API calls
https://github.com/allegroai/clearml/blob/25df5efe7...
Hi SkinnyPanda43
Every "commit" is a new version, so sync changes you need to either create a new version (with parent version as the previous one), and sync the local folder (or manually add/remove files).
If you do not need to actually store the "current" version, you can just reset the Task, and sync it again.
wdyt?
BroadMole98
I'm still exploring what trains is for.
I guess you can think of Trains as Experiment manager + MLOps tied together.
The idea is to give a quick and easy way to move from coding/running on one machine to scaling it to multiple remote machines, with everything that comes with it.
In some ways it is like snakemake, it setups your environment and execute the code. Snakemake also allows you to setup data, which in Trains is done via code (StorageManager), pipelines are also...
yes you are correct, OS environment:TRAINS_PROC_MASTER_ID=1:task_id_here
Hi WorriedParrot51
Take a look at the Experiment execution section:
there is script
and working directory
working directory is the base of the git repository (which is cloned into the docker file)
So if for some reason trains did not properly detect the current working dir here is what should solve the issue, without changing the PYTHONPATH
script path: ./sub_folder/scripy.py working directory: .
What do you think?
Hi ExuberantParrot61
Is the pipeline logic code running from inside the repo?
DeterminedToad86
So based on the log it seems the agent is installing:
torch from https://download.pytorch.org/whl/cu102/torch-1.6.0-cp36-cp36m-linux_x86_64.whl
and torchvision from https://torchvision-build.s3-us-west-2.amazonaws.com/1.6.0/gpu/cuda-11-0/torchvision-0.7.0a0%2B78ed10c-cp36-cp36m-manylinux1_x86_64.whl
See in the log:Warning, could not locate PyTorch torch==1.6.0 matching CUDA version 110, best candidate 1.7.0
But torchvision is downloaded from the cuda 11 folder...
I...
i keep getting an failed getting token error
MiniatureCrocodile39 what's the server you are using ?
Hi @<1545216070686609408:profile|EnthusiasticCow4> let me know if this one solves the issue
pip install clearml==1.14.2rc0
Hi JitteryCoyote63
I think that what happens is that the agent are registered on the same name (id). How many agent do you see in the "Workers" tab?
Ho @<1739818374189289472:profile|SourSpider22>
What are you trying to install, just the agent? if so pip install clearml-agent
is all you need
ShakyJellyfish91 can you check if version 1.0.6rc2
can find the changes ?
if you have cuda 10.2, then the torch 1.3.1 from the cu101 version should work
Notice both needs to be str
btw, if you need the entire folder just use StorageManager.upload_folder
- Artifacts and models will be uploaded to the output URI, debug images are uploaded to the default file server. It can be changed via the Logger.
- Hmm is this like a configuration file?
You can do.
local_text_file = task.connect_configuration('filenotingit.txt')
Then open the 'local_text_file' it will create a local copy of the data in runtime, and the content will be stored on the Task itself. - This is how the agent installs the python packages, but if the docker already contactains th...
TenseOstrich47 / PleasantGiraffe85
The next version (I think releasing today) will already contain scheduling, and the next one (probably RC right after) will include triggering. That said currently the UI wizard for both (i.e. creating the triggers), is only available in the community hosted service. That said I think that creating it from code (triggers/schedule) actually makes a lot of sense,
pipeline presented in a clear UI,
This is actually actively worked on, I think Anxious...
Hi PanickyLion56
Yep savefig also works, you can also do,from clearml import Logger Logger.current_logger().report_matplotlib_figure(title="My Plot Title", series="My Plot Series", iteration=10, figure=plt)
https://github.com/allegroai/clearml/blob/0c5d12b830987aa9bb8d44d81e92ff9198008f29/examples/frameworks/matplotlib/matplotlib_example.py#L25
When exactly are you getting this error ?
VictoriousPenguin97 basically spin down sereverA (this should flush all DBs) then copy /opt/clearml to the new server and spin it with docker-compose. As long as the new server is on the same address as the previous one, everything should work out of the box
. And I saw that it upload the notebook it self as notebook. Does it is normal? There is a way to disable it?
Hi FriendlyElk26
Yes this is normal, it backups your notebook as well as converts it into python code (see "Execution - uncommitted changes" so that later the clearml-agent will be able to run it for you on remote machines.
You can also use task.connect({"param": "value")
to expose arguments to use in the notebook so that later you will be able to change them from the U...
We are working on 1.3.0 so this is right in time
So you are saying it ignored everything after the bucket's "/" ?
It should have been:output_uri="s3://company-clearml/artifacts/bethan/sales_journeys/artifacts/examples/load_artifacts.f0f4d1cd5eb54795b11508dd1e739145/artifacts/filename.csv.gz/filename.csv.gz
Quick update, I might have been able to reproduce the issue ( GreasyPenguin14 working "offline" is a great hack to accelerate debugging this issue, thank you!)
It seems it is related to the known and very annoying Python forking issue (and this is why changing to "spawn" method solves the issue):
https://bugs.python.org/issue6721
Long story short, in some cases when forking (i.e. ProcessPoolExecutor), python can copy locks in a "bad" state, this means that you can end up with a lock acquir...
try:
None
docker_install_opencv_libs: true
ReassuredTiger98 regrading the agent error, can you see the package some_packge
in the "Installed Packages" in the UI? Was it installed ? are you using pip or conda as package manager in the agent (check the clearml.conf) is the agent running in docker mode ?
That being said it returns none for me when I reload a task but it's probably something on my side.
MistakenDragonfly51 just making sure, you did call Task.init, correct ?
What duesfrom clearml import Task task = Task.current_task()
returns ?
Notice that you need to create the Task before actually calling Logger.current_logger()
or Task.current_task()