Reputation
Badges 1
25 × Eureka!I was expecting the remote experiment to behave similarly, why do I need to import pandas there?
The only problem os that the remote code did not install pandas , once the package is there we can read the artifacts
(this is in contrast to the local machine where pandas is installed and so we can create/read the object)
Does that make sense ?
Hi FiercePenguin76
should return all datasets from all projects?
Correct π
JitteryCoyote63 instead of _update_requirements, call the following before Task.init:Task.add_requirements('torch', '1.3.1') Task.add_requirements('git+ ')
ImmensePenguin78
I think the latest RC adds it, should be released later today π
Hi CourageousWhale20
Most documentation is here https://allegro.ai/docs
I think so, when you are saying "clearml (bash script..." you basically mean, "put my code + packages + and run it" , correct ?
Hi all! Does anyone know a solution to my issue with deploying models saved on azure on the clearml-serving docker container?
Hi NuttyCamel41
The easiest is to map the clearml.conf to both the serving and triton containers in your docker-compose.yaml (or k8s secrets) and make sure the conf file has the credentials to access the azure blob. wdyt ?
Where did you add the Task.init call ?
@<1587253076522176512:profile|HollowPeacock33>
Is this a commercial ad? this seems like out of scope for this channel
Can you expand?
So are you saying why do we need to install a specific pip version ?
You can "disable it" by selecting a very high versionpip_version: "<40"https://github.com/allegroai/clearml-agent/blob/077148be00ead21084d63a14bf89d13d049cf7db/docs/clearml.conf#L67
What do you mean cache files ? Cache is machine specific and is set in the clearml.conf file.
Artifacts / models are uploaded to the files server (or any other object storage solution)
Hi @<1552101458927685632:profile|FreshGoldfish34>
self-hosted, you mean the open source ? if so, then yes totally free π
That said I would recommend to have the server inside your VPN, just in case from a security perspective
Hi JitteryCoyote63
Do you have a specific example in mind ?
Hi CleanPigeon16
I was wondering how (or if) you handle interruptions.
Good question, basically (and I might be missing a few details but I think that's the general gist).
A new instance will be spinned (spot/regular based on your "compute budget") as long as there is a job in the "monitored" queue. that mean that if a worker was kicked by amazon (i.e. is spot) another one will be spinned instead as long as there is a job in the queue. That means that what is probably missing in you...
Hi HollowFish37
I think I have good news for you, the clearml-agent is only communicating with the api endpoint, so as long as this is secure, you should be fine. Do notice that the default files server endpoint should be secure as well, as by default it will allow any upload/download
True, this is exactly the reason. That said, you can always manually add it. You can see the default values : https://github.com/allegroai/trains-agent/blob/master/docs/trains.conf
Oh, and good job starting your reference with an author that goes early in the alphabetical ordering, lol:
LOL, worst case it would have been C ... π
Hi SmallDeer34
ClearML automagical logging will work on the current python process. But in your example yyour Bash is running another python script (that has nothing to do with the original notebook), hence clearml automagic is not aware of it (i.e. it cannot "patch" the tensorboard calls).
In order to make it work.
you should do something like:from joeynmt import train train.main(...)Or something similar π
Make sense ?
did you run trains-agent ?
ShaggyHare67 notice that the services queue is designed to run CPU based tasks like monitoring etc.
For the actual training you need to run your trains-agent on a GPU machine.
Did you run the trains-agent init ? it will walk you through the configuration (git user/pass) included.
If you want to manually add them, you can see an example of the configuration file in the link below.
You can find it on ~\trains.conf
https://github.com/allegroai/trains-agent/blob/master/docs/tr...
So obviously that is the problem
Correct.
ShaggyHare67 how come the "installed packages" are now empty ?
They should be automatically filled when executing locally?!
Any chance someone mistakenly deleted them?
Regrading the python environment, trains-agent is creating a new clean venv for every experiment, if you need you can set in your trains.conf :agent.package_manager.system_site_packages: true
https://github.com/allegroai/trains-agent/blob/de332b9e6b66a2e7c67...
ShaggyHare67 I'm just making sure I understand the setup:
First "manual" run of the base experiment. It creates an experiment in the system, you see all the hyper parameters under General section. trains-agent running on a machine HPO example is executed with the above HP as optimization paamateres HPO creates clones of the original experiment, with different configurations (verified in the UI) trains-agent executes said experiments, aand they are not completed.But it seems the paramete...
you should see your agent there
I see in the UI are 5 drafts
What's the status of these 5 experiments? draft ?
ShaggyHare67 could you send the console log trains-agent outputs when you run it?
Now theΒ
trains-agent
Β is running my code but it is unable to importΒ
trains
Do you have the package "trains" listed under "installed packages" in your experiment?
ShaggyHare67 are you saying the problem is trains fails discovering the packages in the manual execution ?